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ABSTRACT 
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used in the district's standardized achievement tests? (3) lack of ' 
motivation on the part^ of students to do their best on the 
standardized achievement tests; and (4) inappropriate administration 
of the standardized achievement tests. Materials and procedures 
designed to reduce or eliminate the influence of these confounding- 
factors included a series of filmstrips, audiotapes t and workbooks to 
teach students test-taking skills; a set of seven practice tests; a 
self-charting procedure designed to motivate students' test-taking; 
and workshops and exercises to improve teachers' skills as 
standardized test administrators. Each of these components is 
described in detail. The contradictions with previous research and 
teachers 1 perceptions of the value of the materials and procedures 
suggest further evaluation of the project materials is necessary { 
before final conclusions are drawn, (pn) 



*********************************** 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 
*********************************************************************** 



ERLC 



An Evaluation of Training - 

in 

Standardized Achievement Test Taking 

and Administration 




Final Report of the 1981-82 
Utah State Refinements to the ESEA 
Titte I Evaluation and Reporting System 



Karl White, Cie Taylor, Susan Friedman, David Bush, and Kathy Stewart 




"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 

j<Lt % lj M. 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



U.S. DEPARTMENT OF EDUCATION 
NATIONAL INSTITUTE OF EDUCATION 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIO 
''This document has been' reproduced as 
received from the person or organization 
originating it. 

Minor changes have been made to improve 
reproduction quality. 

Points of view or opinions stated in this docu 
ment do not necessarily represent official NIE 
position or policy. 



Project conducted by: 

Utah State University and Utah State Office of Education 



9 

ERIC 



2 



FINAL REPORT 

of . 

STATE REFINEMENTS TO THE ESEA 

TITLE I 

EVALUATION AND REPORTING SYSTEM 
UTAH 1981-82 PROJECT 



FEBRUARY, 1983 



ACKNOWLEDGMENTS 



The Research and Development work reported in this document would not 
have been possible without the help of literally hundreds of people'. Although 
all of them cannot be mentioned by name, a few deserve special mention. 
First, Darrel Allington of GranUe School District was the creative drive 
behind Professor Owl and the unique format of the filmstrips used in the 
student traininq materials. His extra efforts in the beginning of the project 
and patient tutoring and . assi stance in the later stages is what brought the 
student training materials into existence. Secondly, the District 
Coordinators in Granite (Maurice Wilkinson) , Nebo (Wi 1 1 iam Rust ) , Cache (Keith 
Clayson), and Logan (Gary Carlston) districts who paved the way for their 
district's participation and served as the liaison person between the project 
and the district. Third, the dozens of individual teachers who added another 
concern to an already busy schedule by participating in the project. Their 
patient understanding and willinq participation in spite of changing schedules 
and unforeseen mishaps was what really enabled the project to be completed. 
Fourth, the staff at the Utah State Office of Education, particularly Jay 
Donaldson, Bill Cowan, and Kent Worthington, who provided constant support and 
encouragement. £nd finally, the other members of the project staff who do not 
appear as authors of this. final report but did devote hundreds of hours to 
assuring a quality product, implementation, and evaluation—Marilyn Tinnakul, 
J. C. Cole, Byron Bair, Edward .Konat , and Heather Nairn. * 
The materials developed in this project and the contents of this report 
were supported in part by funds from the U.S. Department of Education in 
conjunction with Contract No. 300810271 (State Refinements to the ESEA Title I 
Evaluation and Reporting System). The contents do not necessarily reflect the 
views or policies of the Department of Education, nor does the mention of 
trade names, commercial products /or organizations reflect endorsement by the 
U.S. Government. 



ERLC 



4- 



TABLE OF CONTENTS. 



Page 

EXECUTIVE SUMMARY . \ \ . i 

CHAPTER 



I. OVERVIEW OF STUDY *. . . 1 

Objectives . ; -. 5 

Importance and Benefits of Project 6 

II. REVIEW OF RELATED RESEARCH . . . . \ 10 

t Procedures • . 10 

Meta-Analysis Defined . .. . . . 11 

Procedures for Meta-Analysis 14 

Previous Reviews ...... .... 17 

Meta-Analysi s , of Research on, Reinforcement .... 19 

A Typical Study 20 

Results of the "Reinforcement" Meta-Analysis 21 

Summary . •. . . 32 

^ Conclusions . . "". . . . . 34 

Meta-Analysis of Research on Training Students 

in Test-Taking 37... 

' Definition of Training 38 

Previous Reviews ' . 42 

Typical Studies ........... 45 

Results of the Meta-Analysis . 46 

Summary '1 60 

Conclusions . . . 61 

: Review of. Research Related to Training Teachers 

in Test Administration . . . " . . . 62 

Previous Reviews 63 

Test Anxiety ....... . . *. . . . t .... ...... . 66 

Examiner/Examinee Relationships . . T 74 

Information About the Test . 78 

Mechanics of Test Taking 81 

Environmental Factors 85 

Summary * 87 ' 

Conclusions 87 

Summary- 89 



ERLC 



TABLE. OF CONTENTS (continued) 

♦ 

.-• ' Page 

CHAPTER 

III. PROCEDURES . . , 91 

Description of Materials' . . . * 91 

. Filmstrips and Workbooks: Teaching Students 

How ^to Take Tests 92 

Practice Tests . . . . 112 

Reinforcement Procedures ■ . ■ 118 

Training Teachers in Standardized Test. 

Administration r . 122 

Administration Procedures Specific to 

a Particular Standardized Test ■ . . . 127 

Summary . . . . 128 

Sample for Research ■ 129 

Identification and Selection of Sample .......... 129 

Assignment of Sample to Groups 133 

; Implementation of Experimental Treatments ' '135 

Teacher Training in Test Administration .... 135 

Student Training in Test-Taking Skills . . . . 144 

Instrumentation , s 169 

Standardized Achievement Tests . . '. 169 

* Student and Teacher On-Task Behavior . .< 173 

Locally Developed Instruments 133 

Summary 189 

A IV. RESULTS AND , CONCLUSIONS * " 192 

Effectiveness of Training Materials .' • • • 193 

Teachers' Perceptions 193 

Differences Between Groups on Outcome Variables 196 

r. -« * 

Conclusions 217 

Implementation Factors Possibly Related 

to Results • • 220 

Suggestions for Future Research 231 

t REFERENCES 234 



- v- - Q 

ERIC 



TABLE OF CONTENTS (continued) 



Page 



APPENDICES 

A. 

B. 

C. . 

D. 

E. 

F, 

G. 
H. 



Materials Related to Review Literature . 245 

Materials Related to Development of Filmstrips 249 

Materials Related to Development of Practice Tests .... 257 

Materials ReTated to Reinforcement Procedures 280 

Materials Related to Sample Selection and Description ... 283 
Materi als' Rel ated to Implementation of Training 

Mater i al s • 

Materials Related to Instrumentation 

Supplementary Data About Effect of Intervention 

on Dependent Variables 327 



288 
292 



ERIC 



LIST OF TABLES 



Table P53i 

1. Summary of Previous Reviews. 18 

2. Categories for Describing Reinforcement Studies, 
Number of Effects in Each Category, and Mean 

Effect Size ... . 22 

3. Mean Effect Size by IQ for Contingency, Quality, Age, 

Design, Tost Type, and Number of Subjects 2 6 

4. Mean ES by Unit of Test Administration for Type of Test . * . . . 28 

5. Categories for Describing Student Training Studies, 
Number of Studies in Each Category, and the Mean 

Effect Size 49 

6. Mean ES by Quality for Type of Training and Number 

of Subjects . . 51 

7. Mean ES by Type of Test and Quality of Research Design 

for Type of Training, Unit, Age, Design, and IQ 53 

8. Mean ES for Studies Coded "Achievement" by Quality, 

Test, Age, and Training 54 

9. Use of Tests in Utah Title I Projects ■ 93 

10. Number of Test Adoptions for Districts and States by 

Region r 94 

11. Summary of Test Use for Project Tests . . . . . . 95 

12. Subtests . . . . ' 96 

13. Objectives for Student Traininq Filmstrips 97 

14. Time Line for Making Filmstrips and Tapes Ill 

15. The Mean Number of Items and Minutes Used for Each- 
Practice Test t 1*3 

16. Computations Used to Determine Number of Items to 

Include in a 30-Minute Practice Test . . . . 115 

17. Description of Districts Participating in Project 130 

18. Experimental Sample ^ I'l 

9 

19. Implementation of Experimental Treatments /. . 135 

20. Actual Timeline for Implementing Filmstrips, Practice 

Tests, and Teacher Supervision 136 

ERIC 



LIST OF TABLES (continued) 



Table .. ' .^Page- 

21. Training in Test Administration Breakdown by District ; 137 

22. Fall Workshop Evaluation Data . 141 

23. Works-op in Student Traininq Implementation Breakdown 

by District ................. 145 

■24. Teacher Traininq in Student Curriculum Workshop 147 

25. Number of Contacts Between Project Staff and District Staff ... 153 

26. Number of Classrooms, Students Present, and Students 

Absent for Each Filmstrip . . - 155 

27. Number of CI assrooms, 'Student s Present, and Students 

Absent for Each Practice Test / • 156 

28. Summary of Filmstrip Evaluations ■■ . 158 

29. Results from Teacher Evaluation: Project Components 161 

30. Verbal Comments from Teachers on Project 164 

31. Mean Ratinqs Given Teachers for Support and Quality , 168 

32. Standardized Test Formats t . . .<o 171 

33. Test Statistics on Data Collection Instruments ' . , ' 
Developed by Project 176 

34. Breakdown for Observations by Number of Classes 181, 

35. Practice Data .Col lection: Percent of. Interrater 

Aqreement for Quality of Test Administration >. . 182 

36. Practice Data Collection: Percent of Interrater 
Aqreement for On-Task Behavior During Teacher and, 

Student Directed Tests 182 

37. Actual Data Collection: Percent of Interrater 

Aqreement for Quality of Test Administration . . . ,°. 184 

38. Actual Data Collection: 1 Percent of Interrater 
Agreement for On-Task Behavior Durinq Teacher and 

Student Directed Tests 184 

39. Intercorrel ations of Dependent Measures 191 

■> 

? ■ { 



LIST OF TABLES (continued) 



Table ' - , ZiHi 

40.. Scores on Dependent Variables by Experimental Group ..... v . . 198 

41. Total Achievement Test Scores' by Group by Various 

Independent Measures . : . 214 

42. Intercorrelation Matrix for Project Variables \ 218 

43". Third Grade Standardized Achievement Test Scores 

.. from 1981-82 Year * • 224 



LIST OF FIGURES 



Figure P a 9 e 

1. Distribution "of 41 Effect Sizes for Reinforcement 

Studies Considered in* the Met a-Analysis ..... 24 

2. Distribution of 62 Effect Sizes from Student Traininq 

Studies Considered^ in the Meta-Analysis . . 48 

3. The Time Period for Producing Each Practice Test 113 

4. Basic Definitions for On-Task Behavior 178 

5. Box and Whisker Diagrams and Normal Curve Represen- 
tations for Major Dependent Variables 208 

6. Box and Whisker Diaqrams for Dependent Variables 

Using Teacher as Unit of Analysis 226 



-10 

ERIC 



EXECUTIVE SUMMARY 



Testing is a major industry for the American educational system. The 

National Education Association estimates that approximately 200 million '\ 

achievement tests are administered annually in the United States (McKenna, 

1973). Of the three or four published tests .students take each year, the 

majority are standardized measures. Scores from these tests are used as a 

primary source of information in- making decisions about educational 

programming, class placement, student advancement, and evaluations of 

educational programs. Given the fact that standardized achievement tests are 

used to make such important decisions, it is essential to be sure the tests 

are really measuring what they purport to measure. 

* Unfortunately, previous research suggests that several other factors may 

be confounded with scores, received by students on standardized achievement, 

tests. To the 'degree that such factors are influencing the scores students 

receive, decisions which are based on the results of standardized achievement 

tests may be misleading and/or inappropriate. Some of the potentially 

confounding factors identified in previous, research include the following: 

e Administration Procedures . Teachers who do not follow standardized 
test administration procedures in administering the test may cause 
students 1 scores to be higher or lower than they would otherwise be. 
, Students may receive higher scores than they deserve if the test is 
not properly monitored, if inappropriate hints or assistance are 
given, or if cheating is hot carefully controlled. Students may 
receive -.lower scores than they should if- directions afe not given 
clearly, if they are not properly prepared for test, or if a 
nonsupportive or anxiety-provoking atmosphere is maintained. 

^ o Student Test-Taking- Skills . Several previous research studies have 
suggested that mastery of test-taking skills such as checking work, 
using elimination strategies, timing, and following directions are 
i positively related to student scores. 

o Test Format . The format used by different standardized achievement 
tests to assess a student's mastery of the same content area is often 
radically different. There are indications that unfamiliar test 



formats may be confusing for students. Such confusion may result in 
lower test scores even though students know* the material being 
tested,* 

© Student Motivation . Students who- are not motivated to do well on 
standardized achievement tests will probably receive lower scores 
than they would have if they had tried their best. When this 

• happens, the test is at least partly a measure of student motivation, 
.even though the decisions based on test scores assume that the test 
is solely a measure of achievement or mastery of a' particular content 
area. 



Objectives , 

To the degree that these previous research figdings are correct, student 
scores on standardized achievement tests will be invalid because factors 
other than what the\^student knows (e.g., familiarity with format, 
administration procedures, motivation, and test-taking skills) wi 1 1 influence 
scores on the test. Based on the findings of previous research described 
above, this project developed, implemented, and examined the effect of 
instructional materials and procedures designed to eliminate the influence, on 
test .scores of the following four factors. 

1. Differential levels of test-taking skills on the part of students. 

2. Student 1 s' lack of familiarity with and consequent confusion from the. 
question format used in the district's standardized achievement 
tests. , . 

'3. Lack of motivation on the part of students- to do their best on the 
standardized achievement tests. 

4. Inappropriate administration of the standardized achievement tests. 



Materials and Procedures 

Materials and procedures designed to reduce or eliminate the influence 
of the confounding factors described above included a series of nine 
filmstrips, audiotapes, anci workbooks to- teach students test-taking skills; a 
set of seven practice tests formatted similarly to 1 the standardized 
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achievement test used by the district; a self -charting procedure designed to 
motivate students to try their best on tests; and, workshops and exercises 
designed to improve teachers' skills 'as standardized test administrators., 
Each of these components c 'is summarized briefly below and described in detail 
in the project ' s Final Technical Report. 1 

Filmstrips for teaching test-taking skills . Nine filmstrips (lasting 
approximately 30 minutes each), audiotapes, and workbooks were developed to 
teach students test-taking skills such as.checking their work, filling in 
answer spaces correctly, following directions, differentiating between 
correct and look-a-l'ike answers, using different question formats, and using 
partial knowledge to eliminate wrong answers. All of the test-taking skills 
instruction focused on standardized reading achievement tests. Filmstrips 
were designed so that the lights remained on during the filmstrip and the 
following instructional principles were emphasized: 

• the teacher interacted with the f i lmstri p, controlling the pace of 
instruction, checking student mastery, supplementing instruction when 
necessary, and demonstrating correct performance. 

• students were actively involved in the instruction— completion of the 
workbooks occurred during the filmstrip, vocal responses were used 
frequently, and teachers were instructed not to proceed to' new 
material until all students had demonstrated mastery. 

Practice tests . Seven practice tests " (ranging in length from 5 to 30 

minutes) were developed using the same format used "'by" each district's 

standardized achievement test. Content of the practice tests paralleled what 

was being taught to students in their reading group during, the year. The 

practice tests provided an opportunity for students to practice the concepts 

being taught in the filmstrips, become familiar with the format and 

^Copies of this report are available from the United States Department 
of Education (Reference Contract #300810271), the Utah State Office of 
Education , or the ERIC Document Reproduction Service. 
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standardized testing procedures (teachers administered each practice test 
using written directions similar to a standardized test), and to learn to 
work independently in a standardized testing situation. By the time students 
took the standardized test in the spring, it was hoped that standardized 
testing procedures would be a familiar and comfortable experience. In 
addition, these -practice tests were designed so teachers obtained feedback at 
periodic intervals on student mastery which could be useful in designing 

i 

their classroom instruction. 

Motivating students to try their best . Scores on the practice tests 
were also used as a basis to motivate students to try their best on 
standardized achievement tests. A self-charting procedure was used with each 
student's individual chart prominently displayed. Each student received 
points to be put on the chart for improving his or her score from the 
previous practice test (students scoring above 80% on each practice test were 
always given points). It was hypothesized that if students learned to try 
their best on the "standardized" practice tests, this motivation would 
transfer to the actual standardized achievement test giverTin the spring. 

Training teachers in standardized test administration . Teachers were 
trained in two workshops (one in the fall, one in the spring) to be better 
standardized test administrators. During the workshops, teachers -were, 
instructed in standardized testing procedures, critiqued videotapes of good 
and bad test administration, and role played various aspects of test 
administration. Examples of the type of concepts emphasized included student 
seating arrangements, preparing for early finishers, clarifying ambiguous 
directions and making sure all students understand directions,' and 
facilitating a supportive and properly control led atmosphere. 



Experimental Design 

To test the effect of the training materials on students' and teachers 1 

performance during standardized achievement tests, 58 classrooms from three 

school districts in Utah were randomly assigned to one of three groups. 

® Experimental Group 1 classrooms received all of the training 
materials (filmstrips, practice tests, motivation* procedures and 
training in test administration). 

o Experimental Group 2 classrooms received only the student training 
materials (filmstrips and practice tests). 

0 Control group classrooms received no specially prepared materials 
concerning the administration of the standardized achievement tests. 

Project staff at Utah State University provided extensive supervision 
and assistance to- each of the experimental group classrooms participating in 
the project including, training workshops, on-site modeling of material, 
periodic on-site follow-up and assistance, and telephone consultation. Each 
teacher in the experimental groups was visited an average of five times 
during the year in addition to the training workshops. Also, there was an 
average of 7.9 phone consultations with each of the teachers. 

The effectiveness of the project in teaching elementary school students 
test-taking skills, motivating students to do their best on standardized 
achievement tests, and training teachers in the proper administration of 
standardized achievement tests was assessed based on data collected for each 
of the three groups in the following areas: 

1. Teachers' responses to questionnaires and interviews to assess their 
perceptions of the value of the materials and the- quality of 
implementation . ^ 

2. Students' scores on the district's standardi ze^ ach.i evement test. 

3. Observations by blind observers of student and teacher on-task 
behavior during the standardized achievement test. 

4. Student and teacher attitudes towards standardized achievement tests 
as measured by paper-and-penci 1 attitude instruments developed by the 
project. 4 . 



5. Ratings by blind observers of the quality of the teacher's test 
administration. 

The actual measures used to collect data about the project are described 
in greater detail in the Final Technical Report of the project. These 
measures consisted of a series of standardized and 1 ocal ly developed measures 
and observation systems. The standardized achievement tests were 
Administered in each class by the classroom teacher as was the general 
practice of the participating districts. All other data were collected by 
specifically trained data collectors who were uninformed as to the nature of 
the research or the group membership of any other classes, 

Resu Its 

Teachers ' perceptions . Most components of the project, particularly 

filmstrips and practice tests, were viewed very positively by teachers. For 

example: c 

e 84.2% of the teachers felt the filmstrips were worth the time and 
effort required, 

© 78.9% plan to use the filmstrips next year, 

e 94,7% felt the filmstrips taught concepts which were important for 
students to learn. 

o 79% felt the practice tests adequately prepared the students for 
taking the standardized achievement test. 

e 76,3% plan to use the practice ttfsts in the future, 

e 76,3% felt the benefits of the total project were worth the 
investment in time. 

o 73.7% of the teachers felt the project was enjoyable for students. 

o 81.6% of the teachers felt the project benefited the student's 
• test-taking ski lis. 

Teachers' perceptions of the value of the procedures for teaching 

standardized test administration skills were also very positive. Seventy-one 
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percent of the participating teachers felt they were better test 
administrators as a result of the workshops. However, the procedures used to 
motivate students to try their best on tests were, viewed less positively. 
Slightly more thap half the teachers (53.3%) felt that the motivational 
procedures were difficult for students to understand. Only about a third 
(38.1%) felt that the students were motivated by the procedures, and only 
38.3% of the teachers plan to use the motivational procedures in the future. 

Teacher and student attitudes and behaviors . Table 1 includes data, for 
all of the major outcome measures for each of the three groups ("1" refers to 
Experimental Group 1, "2" refers to Experimental Group 2, and "C" refers to 
the control group). As can be noted, teachers participating in the project 
had" improved attitudes towards standardized achievement tests, particularly 
those teachers in Experimental Group 1 who received the training in 
standardized test administration. Teachers' on-task behavior and quality of 
test administratibn was also significantly improved as a result of the 
project. Differences between groups on student attitudes towards 
standardized tests were statistically significant favoring the control group. 
However, in practical terms, these differences were very small. There were 
no differences between the groups on student on-task behavior during the 

test. ... 

Academic achievement . There were statistically significant differences 
between the groups on all of the achievement test scores with Experimental 
Group 2 scoring the highest. Although statistically significant, differences 
" between-the~groups are-re^t-ive-ly smal l-(-an-average of less than one-quarter 
a standard deviation). However, it is important to note that students in 
Experimental Group 1 who received the most intervention consistently scored 
the lowest on the standardized achievement test scores. 

ERJC * If 



v i i i 



Table 1 



Scores for Each Group on Major Dependent Measures 
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(Student-Directed Subtest) 
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3 For each standardized achievement test, a teacher-directed subtest was 
defined as one where the teacher gave directions and controlled the pace item by 
item. A student-directed subtest was defined as one where directions were given 
at the beginning of the subtest and then student worked independently for a 
certain time limit or until they finished. 



D A11 probability estimates are based on one-way analyses of variance between 
means of the three groups. In many cases, distributions are substantially skewed 
so that medians are a better indicator of central tendency. Medians for each 
group on all variables are also reported. Asterisks are used to indicate where 
the order of groups differs depending on whether means or medians, are reported. 
The order of groups represented in .the chart always follows medians. 

c The column labeled ES refers to the standardized mean differences between 
the highest and lowest group (Xhigh " "*low> * SD control group- '"This measure 
has been recommended by^ Glass 119/7) for examining the results of various studies 
using a common metric. 
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Subgroup analyses . To check the robustness of the findings reported 

"above when all students were included in the analyses, several additional 

analyses were done using subgroups of students. These subgroup analyses were 

done for the following groups of students across all dependent variables. 

© Students who received the majority of the experimental treatment. 
These analyses were done eliminating those students who saw less than 
5 filmstrips, took less than 3 practice tests, had teachers who^were 
rated low on quality of implementation or support, were in special 
education programs, or had English as a second language. \^ 

© Students who received all of the- experimental treatment. These 
analyses were done- eliminating those students who saw less than 9 
filmstrips, took less than 7 practice tests, had teachers who were 
rated low on quality of implementation or support, were in special 
education programs, or had English as a second language. 

o Only Title I students who, received all of the treatment. 

o. Students in each of the three participating districts analyzed 
separately. 

The' results of the subanalyses (reported, in detail in the project's Final . 
Technical Report) confirmed in all cases the results reported above in Table 

i. 

Conclusions 

The purpose of this project was to develop, implement, and evaluate the 

effect of training materials and procedures designed to increase the validity 

of standardized achievement tests by improving: 

e Students' test-taking skills, attitudes toward tests, and motivation, 
and 

o teachers' attitudes toward standardized tests and quality of test 
admi ni stration . 

As noted briefly above, the intervention procedures did result in improved 
teachers' attitudes towards tests and quality of test administration. 
Furthermore, teachers*were enthusiastic'al ly supportive of the materials, plan 
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to continue using the materials in the future, and felt that the materials 
resulted in substantial improvements in students' test-taking abilities and 
students' attitudes towards tests. However, the more objective data 
collected by the project indicated that there .were ho meaningful increases in 
students' test-taking skills or students' attitude or performance during 
tests. 

These data raise some perplexing questions in view of previous research 
which has supported the efficacy of the types of intervention developed in 
this project, and in view of teachers' perceptions about the effectiveness of 
the project. First, previous research has indicated that training students 
in test-taking skills has a substantial effect on test scores. When compared 
to the interventions in previous research, the training delivered to students 
in this project was a relatively intense, systematical ly del ivered training 
experience of long duration with good follow-up and monitoring. In spite of. 
this, no meaningful differences were observed between the groups on test 
scores. Most differences which were observed were not in the predicted 
direction. In fact, those students who received the most training received 
the lowest scores. 

The fact that differences were not found is even more perplexing in 

c 

light of teachers '- very positive response to the program materials. Most 
teachers who used the materials during this year plan to continue using the 
materials in the future and felt that the materials had improved their 
students' attitude and increased performance on standardized achievement 
tests. However, the fact remains that none of' these percaived differences 
were apparent on objective measures for which data were collected on the 
project . 
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The contradictions with previous research and teachers' perceptions 'of 
the value of the materials and procedures suggest that further evaluation of 
the materials developed in this project should be conducted before final 
conclusions are drawn. Further research is necessary to understand to what 
degree typically administered standardized achievement tests are valid and 
useful for the purposes for which they are usually used. The materials 
developed in this project represent an important beginning. As they are used 
further and more data are collected, we will be able to better understand the 
degree to which results from standardized tests should and can be used to 
make programming, evaluation, and placement decisions for primary grade 
chi ldren. 



FINAL REPORT: 

STATE REFINEMENTS TO THE ESEA t TITLE I EVALUATION 
AND REPORTING SYSTEM 

CHAPTER I 

1. OVERVIEW OF THE STUDY 

The general purpose of RFP 81-034 was to support further development 

work by State Education Agencies SEAs to "enhance the quality of Title I 

evaluation at the state and school district level." The project described in 

this report accomplished this overall objective by addressing the following 

two specific areas targeted by RFP 81-034. 

» Q ual it'y Control . Efforts designed to improve the accuracy and 
validity of the Title I evaluation data currently being collected. 

• Measurement and Evaluation . Studies, designed to investigate 
Technical aspects of, the current evaluation models . . . 

The Title I Evaluation and Reporting System (TIERS) was designed to 

provide decision makers at all levels with information about: 

»7 7 .the effectiveness of the programs assisted under thi s Ti tie in 
meeting the special educational needs of educational ly deprived 
children; . . . such evaluations will include . . . objective 
measurement of educational -'achievement in basic skills . . ." (ESEA, 
Title I, Section 124 (G)). 1 

In other words, TIERS was designed to provide information about how much 

j 

more children know in basic skills areas than they would have known had they 
not participated in Title I programs. Each of the TIERS Models utilizes 
standardized achievement tests to provide information about how much children 
know about basic skills. Each of the models compare children's scores on the 
achievement tests at the end of the program with a no-treatment expectation 
(i.e., what children would have known had they not participated in the , 
program). The difference in these two estimates of children's knowledge is 
assumed to be attributable to the effect of the Title I program. 

iFor a more complete description of TIERS, see Tallmadge and Wood 
(1981). '* 

' ^ 22 



In addition to using scores from standardized achievement tests to 
evaluate the impact of Title I programs most Title I projects also use 
standardized achievement test scores in selecting children to participate in 
Title I programs, and in making educational programming decisions about those 
students once' they have been placed in the program. 

The validity of these decisions (i.e.* decisions about program impact, , 
student placement, and programming for students) depends on the scores from 
the standardized achievement tests actually measuring vhat the user of the 
test results thinks it is measuring (which in most cases is the student's 
knowledge of the V basic skill area being tested). In other words, for the 
results of TIERS to be useful, valid standardized test results must be 
obtained. However, Cassell (1969) noted that there are at least two 
condition* critical to obtaining valid standardized test results: 

1) The student's score on the test must be a function of what the student 
knowS a b 0ut a topic rather than some other variable. -. 

2) The test must be administered according to specified standardized 
procedures. 

There are difficulties associated with the failure to meet either of these 
conditions. 

An example of violating the first condition occurs when a test is a valid 
instrument for one purpose or in one setting, but does not yield a valid score 
for the particular setting or purpose for which it, is being used. Variables 
such as a student's test-wiseness or test-taking skills and level of 
motivation may influence a test score so that an accurate estimate of academic 
achievement for that particular student cannot be obtained. For instance, if 
a student fills in the bubble on the machine-scorable forWtoo lightly to be 
read, the resulting score will be lower than if the machine had read all of 
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the answers; or, if a student doesn't feel like taking a test that day, the 
score will be different than on a day on which the student is motivated to do 
his/her best. 

The validity of standardized test scores are also cal led*' into question 
when a test administrator violates standardized administration procedures (the 
second condition referred to above). When standardized test -administration 
procedures are not fol lbwed,\chi ldren may mi sunderst and the directions, 
cheating may occur, or time limits may be altered. Additionally, when 
standardized administration procedures are not followed, the comparison of 
obtained scores to those of the norming group will probably be inappropriate.- 
For example, test results obtained by students who did not receive a practice 
test prior to V"~i actual test may de diagnostic information, but will not 

be interpretable according tG a norm group if the students in the norm group 
did take a practice test. 

Despite the importance of these factors in obtaining valid and 
interpretable scores, it appears that little is being done in many classrooms 
to assure that: 

1) tests are indeed measuring academic skills (and not level of test 
taking skills or motivation); and, 

2) standardized test administration procedures are followed. 

Most test companies encourage the use of standardized procedures by 
including a section in the test manuals to alert, teachers of the importance of 
following standardized directions. Furthermore, most teachers have received 
some training in the administration of standardized tests, and most people 
recognize the importance of selecting tests which are appropriately matched 
with the school's instructional emphasis and encouraging students to do their' 
best. However, data collected by the Utah State Office. of Education during 
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the 1979-1980 school year as one part . of the previous- project for State 
Refinements to the ESEA Title I Evaluation and Reporting System (White, 
Taylor, Eldred, & Carcelli, 1981; hereafter referred to as "79-80 State 
Refinements Project"), indicated that substantial problems exist in Utah Title 
I programs which make the interpretation of Title I evaluations, regardless of 
which one of the TIERS models is used, difficult and perhaps misleading. 

The 79-80 State Refinements Project identified four primary factors which ; 
may be confounding the results of Title I evaluations designed to estimate how 
much more students know as a result of Title I programs than they would have j 
known had they not participated in the program. These factors included: 

1) .the procedures used during test administration; 

2) the test-taking skills of the students; 

3) the format of the particular standardized achievement test Which is 
used; and 

4) the motivational level of the students. 

Data from the 79-80 State Refinements* Project provided evidence that' y 
existing^problems in each of tffese four areas may confound the interpretation 
of Title I evaluation results. The project described lVthis report expanded 
on the previous project to (a) more definitively investigate the causal 
relationship of the above factors with student test scores; and," (b) design, 
implement, and test the effectiveness of procedures designed to reduce or 
eliminate factors in each of these areas which may confound the interpretation 
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of scores from standardized achievement tests. The project was a cooperative 
o ef f i between the Utah State Office qj 7 Education, researchers at Utah State 
University, and four LEAs within the. State of Utah. The remainder of this 
section outlines the specific objectives of the project and explains the 
importance and potential benefits of the study. • 
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Objectives 

The overall goal of this study was to design, implement, and test the 
effectiveness of procedures and training packages designed to increase the 
validity of standardized achievement test scores typically used throughout 
Utah in implementing the Title I Evaluation and Reporting System. The 
projected benefit of such development and evaluation work was the increased 
validity of results from the -Title I Evaluation and Reporting System as a 
consequence of standardized testing procedures being followed more rigorously 

and confounding factors such as test-wiseness , motivation of students, and 

\ * * 

testing format being reduced or eliminated. The specific objectives of the 
study included: 

1) LEA personnel administering standardized achievement tests used in 
Title I Evaluation and Reporting System.will adhere more closely to 
standardized testing procedures and will display more positive 
attitudes and increased skill consist^it wrth standardized testing 

l' % 

procedures in administering the tests£ 

2) Students will be more motivated to take standardized achievement tests 
and will display higher levels of test-taking /skills which will - s 
eliminate tftese factors as confounding variables in demonstrating what 
students know. m . 

3) The confounding effects on student test scores of question format will 

be eliminated or reduced. 

4) The causal relationship between scores' on standardized achievement ■ 
°tests and quality of test administration, student test-taking skills, 

" student motivation, and item format wi 1 1 . be /determined . , 
These objectives; were addressed by designing, implementing, and 
evaluating the effectiveness of experimental treatments for students in Title 
I schools. Experimental treatments consisted of: 
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1) Training teachers in proper standardized test administration 
procedures. 

2) Training students in test-taking skills. 

3) Implementing procedures for motivating students during the regular 4 
school year to achieve well on tests. 

. 4) Familiarizing students with the test format s..used_-by^he-i.r^.d.istrictAs 

standardized test, 

Classrooms in Title I schools were randomly assigned to various experi- 
mental and control conditions to test the effectiveness of the intervention 
procedures and to establ ish< what, if any, causal relationship existed between 
these factors and students 1 test scores, ifhe effects of the various interven- 
tions were investigated using a- varietjf of dependent variables including 



'observation of teacher, and student orf-task behavior during testing, scores on 



toward testing, and student' scores on the standardized achievement test. 



Results of the TitjVe.l Evaluation and Reporting System (TIERS) will be 
invalid or mi sleading- to the extent which factors such as administration 
procedures, students' test-taking skills, student motivation, and test format 
are confounding students' scores on standardized achievement tests. As a 
result of this project which developed and evaluated the effectiveness of. the 
procedures to eliminate or control these variables in Title I evaluation 
situations, local, state, and national Title I officials can better, understand 
how TIERS results should be interpreted. - 

As a result of the project, several training packages are available to 
LEAs to train teachers in the proper administration of standardized 
achievement tests and train students in test-taking skills. The following 




the Quality of Test Administration Checklist, student and teacher attitudes 



Importance and Benefits of Project 



eric 



27 



7 



materials have been produced and are available from the Department of 
Education or the Utah State Office of Education for the cost of reproduction. 

1) Training Teachers in Test Administration Procedures: Presenter 1 s 
Guide (150 pp.). 

2) Taking Tests: A Little Magic Always Helps (a series of nine 

f YVmstri ps , work booklets , and audiotapes) ."" 7 

3) How to Take Tests: Teacher's Manual (312 pp., includes masters for 
workbooks, practice tests, reinforcement procedures, and filmstrip 
scripts for filmstrip series). 

The potential benefits of a project such as this for the U.S. Department 
of Education are more far reaching. Testing is a major industry in the 
American educational system. The National Education Association estimates 
that approximately 200 million achievement tests are administered annually in 
the U.S. (McKenna, 1973). Of the three or four published tests that students 
take each year, the majority are standardized measures. Scores from these 
tests are a major source of data that are used to report low achievement and 
inequities in the delivery of educational services. If test scores are to be 
used to document the occurrence of educational- inequity, to compare results 
across groups of students, and to make educational decisions, the test results 
need to be valid and interpretabl e as indicators of student knowledge (i.e., , 
scores must be a measure of the skills the, test was designed to measure). 

Currently, test results are used at every level of education from 
teaching to formulating policy. The objectives addressed by this project are 
particularly relevant for four different areas which include but extend well 
beyond the concerns of Title V: (a) the use of norm referenced evaluation 
procedures,. (b) the placement of students into special programs and 
curriculum, (c) the diagnosis of academic deficiencies, and (d) the funding 
and policymaking for selected educational groups. 
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First, the success o c instructional programs is often determined by 
comparing the pre- and posttest scores of the treatment group to scores of an 
empirically est abl ished norminq group. Group test scores found to be 
sensitive to variations in testing procedures and in student motivational 
levels may not be interpretable according to published normed tables. For 

example.^ if ~a-spr-ing pretest and a spring . posttest are used -to., evaluate a 

program, the two tests were probably given by different teachers and the gain 
or loss may be attributable as much to the way the two tests were administered 
as to the effects of the instructional program. 

Second, students most affected by the use of group achievement test 
scores for diagnostic and placement purposes are frequently those with 
rel at ively low academic achievement levels or socioeconomic status and are in 
programs such as special education, bilingual education, or Title I. Many of 
the rem .'dial groups in which a student may be placed have a limited number of 
spaces that are filled by students with the greatest need. If the basis for 
placement in special instructional programs is a low test score, and if the 
scores of some students are influenced by test-taking skills, low motivation, 
or improper test administration, selection decisions may be incorrect. 

A third area affected by. variation in testing conditions is academic 
assessment. Once students are placed into" a program, academic deficiencies 
should "be precisely identified so- that valuable instructional time is not 
spent teaching skills that have already been mastered. If a student's score 
is a function of misunderstanding of directions, low motivation, or poor 
test-taking skills, deficiencies will be improperly noted and development may 
be retarded by incorrect instructional grouping or programming. 

Finally, additional knowledge about the factors examined in this study is 
important for funding and policy decisions that rely on student test scores. 
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For instance, if the correlation between ethnicity and test scores becomes 
significantly lower when differences in testing conditions, test-taking 
skills, and motivational levels are controlled, then ethnic group comparisons 
made under uncontrolled conditions are less believable. That some data may 
not be a valid estimate of achievement is particularly disconcerting when the 
.people who use the actual test scores for f inancial_al locations; (e.g., 
legislators) are removed from the test setting and are forced to rely on the 
"facts" from score reports. 
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CHAPTER II 

REVIEW OF RELATED RESEARCH 

Previous authors have suggested that student scores on educational 
tests may vary as a result of factors other than knowledge of the content 
being tested and random error (Ebel & Damrin, 1960; Thorndike, 1949; Vernon, 
1962). The purpose of this section is to review and synthesize the findings 
from previous research which were most relevant for the materials and 
procedures of this project. The discussion of the effect of the following 
three factors on test score differences among students establishes the 
theoretical and empirical basis for much of the work described in the 
Procedures Section. 

1) Reinforcement (RE)--giving students verbal or tanqible rewards 
contingent or noncontingent on test 'scores; 

2) Student training in test-taking skills (ST)--providing students with 
practice, coaching, or training in test-wiseness ; and 

'$& 

3) Teacher training in test administration (TT) —training examiners on 
how to implement standardized procedures and how to prepare Students 
for a test. - 

The. procedures used for conducting the reviews are described first and 
then relevant studies are grouped and reviewed separately for each of the 
three factors. A summary of the three reviews is the final section of this 
chapter. ^ 

Procedures 

Two approaches were used to review and summarize the results from 
previous research. First, a "meta-analysis' 1 was conducted on studies of 
reinforcement and student training. A description and rationale of 

^Additional work by the authors in developing prototypes of some of 
the materials used in this project is described in the Technical Proposal for. 
RFP 80-034 and in the Final Report of the previous Utah State Refinements 
contract.* The theoretical review in this section does not refer to this work, 
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meta-analysis are presented >>low. Second, because insufficient research was 
located on the effects of teacher training to justify using meta-analysis, 
findings from studies rel ated to the administration of standardized tests are 
presented. Since this research covers a variety of testing conditions* each 
study is briefly described and y summarized. . . 

The remainder of this section defines meta-analysis, describes the 
meta-analysis procedures, and discusses previously completed reviews on 
Reinforcement (RE), Student Training (ST), and Teacher Training (TT). 

Meta-Analysis Defined 

The term meta-analysis was introduced by Gene Glass in 1976 to describe 

the statistical analyses performed on the results of individual studies for ^ 

the purpose of integrating findings. McGaw and White (1981) quote Glass: 

The approach to research integration referred to as 
"meta-analysis" is nothing more than an attitude of 
data analysis applied to quantitative summaries of 
individual experiments. By recording the properties 
of studies and their findings in quantitative terms, 
the meta-analysis of research invites one who would' 
integrate numerous and diverse findings to apply the 
full power of statistical methods to the task. 
Thus, it is not a technique; rather, it is a 
perspective that uses many techniques of measurement 
and statistical analysis, (p. 12) 

Hence, meta-analysis is not a new methodology--it is an aoproach which 

uses different existing research technologies depending on analyses to be 

completed. Three characteristics distinguish meta-analysis: 

1. The outcomes of individual studies are quantified on a common metric 

so that results can be compared across studies. Examples of 

quantification are the use of standardized scores (such as IQ, r% . 

omega {lo) squared, eta (nr) ) squared, or Glass 1 ES, see below). . 
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2, A comprehensive list, or. at least, a representative sample of studies 
is considered (including journals, government reports, dissertations, 
and unpublished material) so that results of the review can be 
generalized to research which has been conducted on that topic. 

3. Characteristics of individual studies (e.g., size of sample, type of 
design, and age of students) ar.e quantified and coded so that the 
covariation of study characteristics and the outcomes can be 
systematically and empirically examined. 

To conduct a meta-analysis, a comprehensive list or a representative 
sample of^studies is identified by clearly specified procedures. Next, the 
features of the studies are coded quantitatively and outcomes are converted 
into a common metric. Finally, findings are described and analyzed by 
statistical procedures to examine the covariation of study characteristics 
with the outcome measures. 

A meta-analysis offers several advantages (see below) over more typical 
review of research that often present studies with differing results and no 
conclusions. Glass (1977) summarized the problems frequently encountered by 
review of existing research: 

1. Literature searches are haphazard and selective, and often omit 
dissertations. 

2. Reviews are typically narrative and discursive; findings are often 
difficult to understand without .quantification. 

3. * Reviewers who attempt to quantify studies generally (and. 

inaopropri ately) use statistical significance as a method of 
integrating studies to draw conclusions. 
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Meta-analysis, though not a panacea for all review ills, does solve 
many difficulties associated with traditional reviews by being somewhat 
more unprejudiced", quantitative, and general izable. First, it is 
unprejudiced because studies are not arbitrarily or selectively excluded 
on the basis of quality (e.g., poor design, questionable implementation, 
inappropriate dependent variables). Instead, a representative sample of 
previously completed research is considered and characteristics of design 
and analyses which contribute to the quality (i.e. good vs. bad) of" the. 
research are simply coded for use in further analysis. 

Second, meta-analysis is quantitative because outcomes from large 
numbers of studies can be organized by using the same metric. This 
common metric, referred to as effect size (ES), is usually defined 



Finally, meta-analysis, yields more general izable results because 
the studies selected for use in the meta' analysis must be comprehensive 
(include £11 the research) or be representative (randomly sampled to 
typify all research). In addition, the relevant characteristics of each 
study are coded and entered into the analysis as variables. This process 
encourages stronger, more adequately supported conclusions than reviews 
which synthesize research on the basis of methodology or statistical 
significance. 



as 



Effect Size = 



experimental " control 



(Glass, 1977). 



(1) 



control 
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Proced ures for Meta-Analysis 

This section describes the procedures used to locate studies and 
code study characteristics for the meta-analysis. The steps explained 
below are those used to complete two separate meta-analysis, one analysis 
for reinforcement and another analysis for student training. Specific 
details about the individual analyses will be provided in the sections 
Reinforcement and Student Training . 

Locating studies . The first step was to collect all the studies 
regarding reinforcing test behaviors (RE), student training in test- 
taking skills (ST), or teacher training in test administration (TT). The 
primary sources for these studies, were four library data based computer 
searches conducted at Utah State University. The data bases included 
Educational Resources Information Center (ERIC), Current Index to 
Journals in Education , (CUE), Psychological Abstracts , and Dissertation 
Abstracts . Computer searches yielded 31 RE, 79 ST, and 0 TT titles 
by using combinations of the following descriptors. 

test (ing) (s) test wiseness (TW) student 

administration (tor) elementary .teacher 

reinforce (r) (ment) test score (s) exam (iner) (ation) 

train (ing) motivation practice 

standardize (d) reward (s) coaching 

intelligence (IQ) achievement aptitude 

Since no research was located for the TT factor, this review was 
dropped from the meta-analysis. Once the articles for RE and ST were 
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located, the bibliographies and references provided a second source of 
studies. 

To qualify for inclusion in the meta-analysis, articles had to meet 
the following criteria: 

1. The test used to measure the outcome had to be a standardized 
intelligence or achievement test (aptitude tests were classified as 
achievement) . 

2. At least one independent variable had to be a "treatment 11 
appl ied to subjects. 

3. The outcome data had to be reported as test scores. 

4. The research could not be supported by a test publisher because 
of the possible bias that might ensue during the study. 

Some of the articles failed to meet criteria and were therefore 
excluded. The most common deficiencies found in rejected studies were 
the use of a nonstandardized outcome measure (n = 17) and researcher 
affiliation with a test publisher (n = 27). 

Some articles described more than one treatment effect and each 
effect within an article was separately coded,. For example, if the 
impact of practice testing was measured twice (immedtately following the 
practice and after one month), scores from both posttests were used to 
compute two effects. \"The final yield was 4^ RE effects from 18 articles 
and 62 ST effects from 37\ articles. The 55 articles used in the 
meta-analyses are identified in the References as "R" for reinforcement 
and "T" for student training. 

Coding study characteristics . The next step in the meta-analyses 
was to describe the study characteristics. To determine what information 
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to include in the analyses, all the- studies were read and preliminary 
estimates were made as to which research conditions might affect the 
relationship between test scores and reinforcement or student training. 
These conditions were listed on a summary sheet so that each article 
could be quantified on various characteristics. Examples of the coding 
sheets used for RE and ST are located in Appendix A, The following study 
characteristics were cqjded for both Reinforcement ., studies and Student 
Training studies: number of subjects, mean age of subjects, mean IQ of 
subjects, type of tests used as a dependent variable (IQ, achievement, or 
aptitude), test administration unit (group or" individual ) , type of 
research design, quality of research design, effect size, and 
investigator 1 s coi, elusion about the effectiveness of the intervention. 
For the Reinforcement studies, the type of reinforcement (money, candy, 
praise, reproof, token, choice, and prize), the schedule (immediate or 
delayed), and the contingency (contingent or noncontingent on correct 
test scores) were also coded. The type of training (practice or test , 
wiseness) provided was coded for the Student Training studies. 

"High 11 quality was coded when studies basically accounted for 
internal and external threats to validity (Bracht & Gla^s, 1968; Campbell 
& Stanley, 1963). "Low" quality was assigned to studies that failed to 
control for one or more major extraneous variables. 

One or more effect sizes (ES) were computed using Equation 1, where 
each mean student test score was transformed for each study into a common 
index that described the impact of .the intervention (reinforcement or 
student training) and could be compared across studies. For studies that- 
did not report standard deviations or means on the outcome variable, the 

I 
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ES was calculated from statistics such as Jt or JF, using procedures 
outlined by McGaw and Glass (1981). For pre-post studies where one 
group was compared against itself, the pre-test mean score was used in 
the calculations as the control group mean. 

Previous Reviews 

Seven previous reviews on test-taking skills or test administration 
were located and the characteristics of each review are summarized in 
Table 1, Five reviews summarized research on training students in test 
wiseness. Two reviews described studies which examined procedures 
related to test administration even though no studies examined the 
effects of training examiners. No reviews were located on reinforcing 
testing behaviors. ^ 

In general, the previous reviews lacked three critical components 
(Glass & Smith, 1979): (a) a systematic method for identifying studies 
to be included; (b) a common index for quantifying data for comparisons 
across studies; and (c).a systematic integration of the reviewed data ■• 
into a meaningful summary. j 

Although one reviewer (Vernon, 1954) reported results in effect size 
by converting all mean scores to IQ units, the covariation of study 
characteristics with outcomes was not considered systematically. 
Sattler and Theve (1967) described research in terms of level of 
significance and drew conclusions by "voting" (see Light & Smith, 1971) 
on the number of studies that obtained statistical significance versus 
the number of studies with nonsignif icance. The remaining five reviews • 
discussed studies in terms of conclusions drawn by the primary 
researcher. • - 
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Table 1 

Summary of Previous Reviews 





Topic of 


Type of 


Method of 


Previous.' 


'Outcomes of 

r 


Number of 


Conclusions about* 


Author, year 


review 


sample 3 


selection 


reviews cited 


individual studies 

M 


studies 


the effectiveness 


i 


- 


i 


•specified? 15 


and critiqued? 


reported In terms of c 


reviewed v 


of treatment ' 


Fueyo, 197? 


Test taking skills 


0 

Convenience 


No 


■ No ' 


Conclusions 




Effective 


Kirkland, 1971 


Test administration 


Convenience 


No 


Ho 


Conclusions 


( A 


Inconclusive 


Ml 1 1 man » tnsnop, 


Tuff faHrtrt cHllc 


■ fonvenipnrp 


No ' 


No 


Conclusions 


8 


Effective 


& Ebel , 1965 












• 




PrJu»rt< 1979 


Tpcf faifjnfl skills 


" Convenience 


No 


No 


Conclusions . 




Effective 


.Sarnackl, 1979 


Test taking skills ■ 


Convenience 


No. 


No 


Conclusions 


/. 


Effective ^ 


Sattler & 


Test administration 


Convenience 


No 




Statistical 


56 


Effective 


Iheve, 1967 








r 

\ 


significance 




* t 


Vernon, 1954 


Test taking skills 


Comprehensive 


No 


'lo 


Effect size ■ 


20 


Effective 



•If the mini was based on a limited nu^er of studies and gave no procedures for how studies were selected, It was assumed that the sample 
was a convenience sample. 

b To be coded "yes ; w * the specific procedures used to Identify and select articles for the review had to be inscribed, . 

e ffect size refers to any kind of measure whicb^ld be compared on a conon metric across all studied ' 'To be coded s tatistical slgnlfl- 
cance, the review had to report whether the significance was in favor or against the treatment for the majority of studies reviewed. Reviews that 
reported the pHii^y investigators' conclusions without mentioning statistical significance were coded conclusions . 

dEntrles in this column reflect the authors' stated opinion in the review article. 



Two reviews contained criticism of the primary research in terms of 
design or confounding factors, but the consideration that reviewers gave 
to those problems in selecting studies, comparing effectiveness in 
outcomes, or drawing conclusions was undefined and appeared to be 
unsystematic. The results and conclusions of the review articles are 
discussed in the appropriate ST or TT section of this chapter. 

The remainder of this chapter is divided into four sections: 
Reinforcement, Training Students (in test-taking skill s), -Training 
Teachers (in test administration), and Summary and Conclusions.. The 
first two parts util ized meta-analysis procedures to integrate the 
existing research. Since no research was identif i ed on the effect of 
training teachers to administer tests (section three), a short narrative 
review of research on related topics is provided. The results of the 

meta-analyses and related test administration research are summarized in 

• » 

the final section of the chapter. 

Meta-Analysis of Research on Reinforcement 

Research has demonstrated the positive effects of rewarding 
various types of academic behavior including test taking (Axelrod, 1972; 

Ullman & Krasner, 1965). However, no reyiews of primary research were 

/ 

located which surveyed the studies that/specifically investigated the 
effect of reinforcement .procedures by /using test score as the outcome 
measure. „ / 

Two recently completed dissertations (Baer, 1978; Weiss, 1980) 
include reviews of previous research on the effects of reinforcing 
intelligence test-taking behaviors. In both reviews the primary research 

1 ~~ 41 
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was grouped by subject IQ level into low, average, and high. Both 
reviews concluded that students with initially low IQ scores show 
significant gains in IQ scores over controls when the correct responses 
are reinforced on a second test. However, studies that examined the 
effect of reinforcement on students with high IQs, found no significant 
changes "iTTIQ from the. first \to the second testing. Similarly, most of 
the studies that examined average IQ students found nonsignificant 
changes in IQ level s. 

The primary purpose of this section is to report the results of a 
meta-analysis of previous research to answer the question: Does- 
reinforcement increase test scores? This section reviews the primary 
research on reinforcing test-taking behaviors and contains a description 
of a typical study, the results of a meta-analysis on previous studies, a 
summary, and the conclusions. 

A Typical Study * " 

In a typical study included in the meta-analyses, standardized tests 
were administered twice -to two groups of -students. The control group 
received two identical administrations, both following standardized 
procedures. The treatment group was given only one standardized 
administration, then retested using standardized procedures except that 
a reward was provided to students who received higher scores than they 
did on the first test. Test scores between the first and second test 
administration were compared to determine if the reinforcement resulted 
in significantly higher scores. 
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Results of the ''Reinforcement" Meta-Analysis 

The 41 effects sizes that were identified in 18 articles were used 
in the meta-analysis, A summary listing of ESs by study is included in 
Appendix A. The articles describe the impact of providing different 
types and schedules of reinforcement on the academic and IQ test scores 
of students aged 4 through 23. The articles from which the studies were 
reported, were published from 1917 through 1980. Three were doctoral 
dissertations. 

Overall effects . Descriptive statistics were used to compare the 
results of reinforced and nonreinforced testing conditions. Table 2 
lists the study characteristics coded for each study (including the 
coding categories), the number of effect sizes in each category, and the 
effect size. According to invest igators s reinforcement was effective in 
increasing the test scores in 56% (23/41) of the reported effects. Only 
two authors concluded that reinforcement did not increase scores. 
Sixteen ESs were judged by the authors as being inconclusive. In other 
words, most investigators who have examined the effect of reinforcement 
on test scores have concluded that reinforcement does r^ise test scores*. 

These conclusions are empirically supported by the fact that the 
mean ES across all studies was .50, with a standard deviation of .58 and 
a. standard error of .09. That is, when students are reinforced for 
scoring higher than predicted from the prestest, scores under the 
reinforced . condition are one-half standard deviation higher than the mean 
score.obtained under nolireinforced conditions. This implies that a • 
typical student who is reinforced for scoring higher than predicted will 
score at the 69th percentile on an achievement test, whereas if the same 
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Table 2 

Categories for Describing Reinforcement Studies, Number of 



Effects 


in Each Category, 


and Mean 


Effect Size 




Characteristic 


CnMnci ratPOOrieS 


Number of 


Effect size 


Standard 






effect sizes 




deviations 


Number of subjects 


12 - 29 


1 A 
1 4 


.50 


.82 




30 - 100 


23 




.43 




Over 100 


4 


• to 


.41 


Age of subjects 




4 


i no 


.56 




7-10 

r 


23 


•37 " 


.65 




11 - 23 


14 


.3/ 


.40 


IQ of subjects 


43 - 85 


9 


i in 

1 . |U 


.70 


• 


86-100 


21 


. 43 


.43 




Over 100 


11 


.10 


11 
. j j 


Type of re in forcer 


Money 


c 

3 


.61 


.59 




Candy 


. 8 


. 34 


1 .06 




Praise 


7 


.40 


.34 




Reproof 


2 


.10 


.06 




Token 


12 


.55 


.41 




Choice 


3 


.88 


.20 




Prize 


4 


.60 


.62 


Type of reinforcement 


Immediate 


31 


.49 


.63 


schedule 


(after item) 










Immediate 


6 


.61 


.30 




(after subtest) 










Delayed 


4 


.43 


.62 


Contingency 


Contingent 


32 ■ 


. 51 


.63 




Noncontingent 


9 


.40 


. 39 


Type of test 


Academic 


12 


CO 

• 3 J 


37 




Intelligence 


on 


.49 


.66 


Administration unit 


Individual 


Jo 


. 51 


.61 




Group 


5 


• 4? 


.54 


Type of design 


True experimental 


23 


. 34 


.72 




Quasi -experimental 


14 


At 
. 4 / 


.35 




Pre/post 


4 


.37 


.44 


^jality of design 


High 


28 


.46 


.64 


Low 


13 


.58 


.46 


Conclusions drawn in 


study Treatment worked 


23 


.77 


.62 




Inconclusive 


16 


.15 


.27 




Treatment did not work 2 


.2T 


.64 


i 


Overal 1 


• 41 


.50 


.58 
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student were not reinforced, he or she would score at the 50th 
percentile. 

For low achievers with average IQs (i.e., 85 to 115), scores at the 
20th percentile under nonreinforced conditions would be at the 36th 
percentile under reinforced conditions. However, this translation from 
effect size to percentile must be interpreted in conjunction wtih 
findings , that IQ influences the impact of reinforcement procedures (refer 
to IQ below for a more detailed discussion). 

The Joint Dissemination Review Panel (1977) has -described effects of 
the magnitude found in this meta-analysis (ES = .50) as educationally 
significant and Cohen (1977) has reported a half standard deviation as 
medium size. The number of effect sizes for each ES is graphed in Figure 
1. ESs varied across studies, but 39% (16/41) of the studies had ESs of 
.50 standard deviation units or more. Nearly one-third (29%) of the 
studies reported larger effects (ES = .75 or higher) in favor of 
reinforcement. 

As indicated in Figure 1, the distribution (mode = 0) of ESs is 
positively skewed toward high ESs. The median ES of .29 may be a better 
indicator of central tendency than the mean (.50) because of five 
extremely high ESs over 1.00. However, since three of the five ESs were 
from high quality studies (see Appendix A) and the median would not 
reflect their impact, the mean is used to represent the overall effect of 
reinforcement studies. 

The data\ displayed in Table 2, show that on the average, reinforcing 
students for performing better on standardized educational tests results 
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MEDIAN = .29 
MEAN = .50 




EFFECT SIZE 



Figure 1. Distribution of 41 effect sizes, for rein- 
forcement studies considered in the meta- 
analysisi 
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in substantially higher scores (Xes = < 50 » Md ES s * 28 )* These 
data imply that for some students, scores obtained under nonreinforced 
conditions may not be indicative of their true achievement level. 
However, \ the overall results must be interpreted in conjunction, with a 
number of other variables considered in the meta-analysis. The most 
important of these variables is the IQ level of students in the sample. 

IQ of students in the sample . Test scores from low IQ students (45 
through 85) are more affected by reinforcement (ES = 1.1) than scores 
from medium (ES = .45) or high (ES = .10) IQ students. Translated into 
percentiles, a student with an initial IQ of 60 will receive 76.5 when 
reinforced on an intelligence test. A low IQ student scoring at the 20th 
percentile on an achievement test would shift to the 56th percentile if 
reinforced. An ES of .10 indicates that a high IQ student may slightly 
increase a score when reinforced during an intelligence test or 
achievement test. However, the low ES must be interpreted with caution 
because the student may be scoring very close to the highest possible 
score and may be unable to score higher regardless of motivation or 

circumstance. — 

Table 3 presents a further breakdown of how the IQ of students in 
conjunction with other study characteristics influences the ES of the 
study. Reinforced low IQ students (43 through 85) aged 4 though 6 are 
affected most by reinforcement procedures. Even within other categories, 
low IQ is associated with larger effect sizes than medium IQs which are 
larger than high IQs. 
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Table 3 . 

Mean Effect Size by IQ for Contingency, 
Quality, Age, Design, Test Type, and 
Nurriber of Subjects 



iQ 



Characteristic 



Overall 



43 - 85 



86 - 100 



Over 100 



Contingency 

Contingent 1.42 (5) .48 (19) 

Noncontlngent .70 (4) .13 (2) 

Quality 

High .92 (6) .66 (16) 

Low 1-19 (3) .38 (5) 

Age 

4 - 6 1.39 (2) .95 (l) 

7 . io 1.46 (2) .41 (9) 

11 - 23 -84 (5) .45 (11) 
Design . 

True experimental 1.19 (5) f .51 (11) 

Quasi-experimental 1.02 (2) .35 (9) 

" Pre/post " .72 (2) .66 (1) 
Type type 

Achievement .82 (2) .48 (10) 

Intelligence 1.18 (7) .42 (11) 
Number of subjects 

12 - 29 1-24 (5) .30 (4) 
30-100 .92 (4) .49 (17) 
Over 100 

Overall 1.10 (9) .45 (21) 



.01 (8) 
.35 (3) 

> 

.29 (6) 
.15 (5) 

.29 (1) 
.07 (9) 
.23 (1) 

.05 (6) 
.46 (3) 
•05 (2) 



.10 (11) 

.09 (5) 
.26 (2) 
.26 (4) 
.10 (11) 



.51 (32) 
.39 (9) 

.46 (28) 
.58 (13) 

1.00 (4) 
.37 (23) 
.14 (57) 

.54 (23) 
.47 (14) 
.37 (4) 

.53 (12) 
.49 (29) 

;50 (14) 

.54 (23) 

.26 (4) 
.50 



Note . Numbers in parenthesis Indicate the number of ESs. 
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Contingent reinforcement has a greater impact on low IQ students 
than noncontingent reinforcement. The data in Table 3 show that scores 
from Tow IQ students increase by 1.42 standard deviation units under 
contingent reinforcement conditions and are considerably more affected by 
reinforcement than scores from high IQ students who are reinforced 
contingently (ES = .01). Therefore, under contingent reinforcement 
conditions, a low IQ student may increase an IQ score from 60 to 81 or an 
achievement test score from the 20th to the 77th percentile. 

In many cases the small number of ESs .available for analysis 
requires fairly cautious . interpretations of estimated impact of various 
conditions. However, the trend supported by these data indicates that 
there is an inverse relationship between student IQ and the amount of 
increase in test scores from unreinforced to reinforced test conditions. 
In addition, the data represent all of the research which could be 
located to address these questions and consequently represent the best 
estimate until further research is conducted. 

Type of test and administration unit. Most of the studies measured 
intelligence (71%) and were individually administered (88%). There do 
not appear to be significant differences in outcomes between types of 
tests or units of administration (see Table 2), but there is evidence 
that group -administered Ig tests resulted in smaller effects (ES = .07, 
n = 3) than individually- administered _IQ tests (ES = .53, n = 26; see 
Table 4). These data provide fairly clear evidence that reinforcement on 
individually-administered test results in higher scores. However, the 
data for group administered tests are more equivocal. 



Table 4 

Mean ES by Unit of Test Administration 
for T; :e of Test 



Type of test 


Unit or test administration 


Overall 


Ind dual 


Group 


IQ 


.53 (26) - 


- .07 (30 


.49 (29) 


Achievement 


.46 (10) 


■ 1.12' (2) 


.53 (12) 


Overall 


.51 (36) 


.49 (5) 


.50 (41) 



Note. The numbers in parentheses indicate number of ESs. 



Although, overall means indicate no differences between IQ and 
academic tests or individual and group-administered tests, the disparity 
in Table 4 between group-administered IQ and academic tests raises some 
important questions. The three studies which administered group intel- 
ligence tests were undertaken in the 1930' s with elementary (ES = .08), 
junior high (ES = -.11), or college (ES = .23) students. The authors of 
those studies described the reinforcement treatment as promising prizes 
or providing praise and encouragement. However, rivalry appeared to be 
the basis for rewards . and for the appeal to "try your- best." In all 
studies, students were urged to increase their rank position by competing 
with those of higher standing or with the control group. The use of 
rivalry as a motivational technique is questionable, as demonstrated with 
the low mean effect size of .07. Perhaps rivalry is age-dependent; that 
'is, it is more effective with college students than with younger 
students. 
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Two articles described the effects of reinforcing group achievement 
test behavior which is the focus of the present research study. In one 
study (Ayllon & Kelly, 1972), a classroom of 30 normal fourth-graders 
was given token reinforcements for correct responses to questions on a 
standard achievement test. Tokens were delivered after each subtest and 
back-up reinforcers were available after the total test. A statistically 
significant difference between reinforcing and nonreinforcing conditions 
was achieved (t_(30) = 5.90, p < .01) and the effect size was .66. As 
with most pre/post designs, there were several factors that threatened 
both internal validity (history, maturation, testing) and external 
validity (incomplete description of treatment, Hawthorne effect, pretest 
sensitization). The extent to which the significant results of this 

U ■ 

experiment can be generalized is questionable due to the threats listed 
above, the small number of subjects, and the single classroom used. (See 
Campbell and Stanley, 1963; Bracht and Glass, 196S, for a thorough 
discussion of these rival hypotheses.) 

A second study (Chapman & Feder, 1917), like many early reports, 
omitted much of the relevant treatment description. Essentially, 
extended practice oh three math tests was given to two groups of 16 fifth 
grade students who were matched on addition test scores. Group B worked 
under normal conditions and Group A Was given external incentives (i.e., 

stars and back-up reinforcers) for high scores or improvement. Data were 

\ ■ 
kept for ten consecutive days and visually analyzed by daily graphing the 

mean test scores of both groups. The results showed the mean test score 

for^Group B to be higher than Group A at every data point. Several 

methodological problems in this study threatened internal and external 

validity. . \ 
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First, the students were all from the same classroom and were 
probably not isolated during the testing or the delivery of reinforce- 
ment. Therefore, the influence rr arizes being given to Group B may have 
depress :' he scoro- r oup A. 

j, ho stc.rs and ere used to motivate the 

students, competition was the more likely incentive for Group B. That 
is, each day's scores were published and students were encouraged to 
"beat" their last score and their classmates. Stars and prizes (given at 
the end of the. study) were given to only the top 50% for efficiency and 
improvement. 

The third potential extraneous variable was that students were 
matched on scores only from the test (addition) that obtained 
substantially different- results between treatment and control groups. 
For the addition test, Group A's scores actually decreased from the first 
to the tenth data point while Group B's increased. Scores for Group A 
students increased in a similar manner to Group B in the other two tests. 

Fourth, although students were matched on scores from one dependent 
variable, it was only a ten minute test. The fact that final scores on 
the other two measures did not differ between groups, creates suspicion 



that the matching criteria may have been biased. 

Fifth, the subtests were too short (i.e., 10, 5, and 1 minutes) and 
not properly standardized, according to today's standards, and the number 

of subjects was too small to justify generalization of the results. 

i 

While the data in both of these studies support the notion of 
reinforcement improving group academic test scores; both reports are of 
insufficient quality to rely on the findings. 



The small number of available ESs, the poor quality of existing 
research on this topic, and the disparity in previous results indicate a 
need for additional research investigating 'the effect of reinforcement 
using group-administered tests. Research investigating group- 
administered academic tests is particularly important because of the 
frequency with which these tests are used to make educational decisions 
about students which might be influenced by the instructional level of 
the student. 

Other study characteristics . IQ was found to account for most of 
the variance in ES across the categories of various study characteris- 
tics. To illustrate the influence of IQ, note that the data in Table 2 
indicate that studies with over 100 subjects (n = 4) have smaller ESs 
than studies with fewer subjects. However, the subjects in those four 
ESs were high IQ students and, therefore, a smaller ES is to be 
expected. 

Eighteen ESs were from studies on second grade students. Individual 
intelligence tests were used to measure the effect of a variety of 
rewards with the exception of money. Of the one achievement and four 
inte lligence tests that were reinforced with money, all were individual 
exams given to fourth and fifth graders with average IQs. The most 
powerful reinforcer was giving the students a choice of the reward they 
desired (ES = .88). The least effective reinforcer was reproof 
(ES = .10). 

Studies were coded True Experimental, Quasi-Experimental,, or 
Pre/Post designs based on the definitions provided by Campbell and 
Stanley (1963). ESs were not significantly different across designs, 
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although pre/post designs were below the mean ES (.50) at ,37 standard 
deviation unit. When the low quality studies (32%) wene removed from the 
analyses, the ES decreased only slightly to .46. 

All pre/post designs (n = 4) were rated "low" quality. Quasi- 
experiments were coded "low 11 (n - 6) when various threats to external and 
internal validity were present including. statistical regression, poor 
matching ^techniques, volunteers, pretest sensitization, experimenter 
effect, and inconsistent or poor description. Three of the 23 true 
experiments were coded Jow qual ity because of the use of volunteers, 
experimenter effects, the Hawthorne effect, and the lack of population 
validity. 

In all the reviewed studies, "novelty" was a rival hypothesis and 
posed the greatest overall threat to external validity. The 
reinforcement procedures implemented by the investigators were always 
novel experiences for the subjects. That is, the treatment consisted of 
providing activities not typically associated with standardized testing. 
Consequently, differences between reinforced and nonreinforced students 
may be caused by experimental students attending to the newness of the 
reinforcement activities rather than by higher motivation to do well on 
tests. 

Summary 

The results of the meta-analysis produced substantial evidence that 
reinforcement techniques result in higher standardized test scores. The 
overall effect size of studies comparing the results of reinforcement and 
nonreinforcement was .50 'Standard deviation units-. Although the median 
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(.29) was considerably lower, it^was not used as a measure of central 
tendency because the effect of thVee high quality studies (ES = 1.98) was 
better represented by the mean. 

A mean ES of .50 corresponds 1 to a standardized test score increase 
of about 19 percentile points for typically achieving students, 16 

percentile points for low achievers of average IQ, and 36 percentile 

\ 

points for low IQ students (ES = 1.1). Substantially smaller increases 
would be. expected with high achieving and high IQ students. 

Just over half of the effects (23/41) were from .stud i es reporting 
that reinforcement was effective in increasing test scores. Ten of 
those used achievement tests and-13 used intelligence tests. Only five 
effects were from group tests (two achievement, three intelligence). 

The two studies that examined the effect of reinforcement on group 
achievement tests had major methodological problems which prevented 
confident conclusions. However, the results of both. studies did favor ' 
. th^e- reinforced students. Three studies that used group intelligence 
tests to examine the use of rivalry to "challenge" students into . 
increasing their IQ scores had inconsistent results. 

Younger students appear to be more easily o influenced by reinforce- 
ment (ES = 1.00) than'older students (ES =.46). All ESs from studies' 
with second grade (n =18) students used individual intelligence tests 

i 

and ranged in ES from 2.69 to -.26.; 

When the poor quality studies were removed, the ES decreased only 
slightly from .50 to .46. The major methodological problems were the use 

f 

._of pre/post designs, poorly matched subjects, volunteers, and nonrandom 
assignment as well as violations/of external validity including 
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population validity, limited treatment description, and the Hawthorne 
effect. The type of design was unrelated to the magnitude of the 
effect size obtained. 

All rewards (excluding reproof) that were investigated were 
effective in raising scores. Moray was used as a reinforcer in five 
effect sizes with individual intelligence tests and was an effective 
agent in .increasing test scores (ES = .61). No money rewards were 
provided with/ group or achievement tests. 

In further study characteristic breakdowns by IQ and category, IQ - 
was clearly the most important differentiating factor. That is, the test 
scopes of Tow IQ students increased more under reinforcement than the 
^dofes of high IQ students. Furthermore, the strongest effects of 
(reinforcement were found with young (ages 4 through 6), low IQ (45 
through 85) students. These results support the conclusions reached in 

dissertation reviews by Baer (1978) and Weiss (1980). 

i 

Conclusions 

\ Much research has documented that major changes in behavior rates 
have been produced by the application of reinforcement principles. Yet 
there, are little data. to show that these procedures can be applied to • 
*one\of the /most important behaviors in education: performance on group- 
admiijistered standardized achievement tests. 

The/meta-analysis conducted with 41 studies related to the impact 



of reinforcement on standarized tests scores found reinforc echniques 
to be effective in increasing test scores by .50 standard deviation \ 
units Most of the previous research used individual intelligence tests, 
and only two group achievement test studies were located. Although / 
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generalizations from only two studies must be cautiously interpreted, the 
mean effect size of 1.10 lends support to the notion that providing 
students with reinforcement will increase their group achievement test 
scores. 

However, these studies were of questionable value because of either 
a pre/post design or the poor matching of a small number of subjects 
(Xn = 31) from intact classrooms. Also the lack of treatment 
description makes replication impossible. No large scale, high quality 
true experiments have examined the effect of reinforcement on group 
achievement test behavior. 

The fact that group testing is* so prevalent in the nation's schools 
and that most students take at least one group achievement test per year 
until graduation, emphasizes the need for investigating the effect of 
various testing conditions on test scores. Research on the effects of 
providing reinforcement on student test-taking motivation during group 
standardized achievement testing is particularly necessary to address the 
following concerns. 

1. The needs of students tto experience highly motivating situations 

in all school activities incl uding \tests . 

\ 

2. The elimination of motivations an ambiguous and discriminating 
variable in test interpretation. \^ . 

According to the meta-analysis data, an 1Q score of 81 measured 
under reinforced conditions compares to an unreinforced IQ score of 60. 
For achievement tests, a reinforced percentile of 69 compares to an 
unreinforced percentile of 50. Since reinforcement appears to have a 
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substantial impact on test scores, methods must be found to eliminate 
student motivation as a source of variance in test score comparisons. 

More specifically, research is needed on the impact of reinforcement 
on group test scores. All pre vious research located on primary students 
has used individual testing. An examination of reinforcement techniques 
on the group achievement test scores of primary students is clearly an 
important step in furthering the understanding and interpretation of 
test-taking behaviors. Currently no high quality research studies 
demonstrate the effectiveness of reinforcing students on group 
achievement tests. Although results from individual testing show that 
reinforcement increases scores and may generalize to group testing, there 
are differences that should be examined. 

For example, by its very nature, individual testing can encourage 
high student motivation due to the close proximity of the examiner and 
the ease of controlling undesirable effects (fatigue, illness, 
nervousness, and anxiety). The problems created by group testing (e.g., 
machine-scoreable answer forms and large group directions) are more 
difficult to overcome because of the large pupil/teacher ratio. 
Moreover, testing experiences that differ from the daily work are first 
encountered in the early grades. 

Based on the review of previous research, there is a need for a 
larger scale study to investigate reinforcement procedures on test 
scores. Such a study should meet the following conditions: 

1. Employ a known reinfofcer . The study should not test the ■ "\ 
strength of the reward. Instead,- the research should demonstrate the 
impact on test scores of using a known strong reinforcer. 



37 



2. Use a true experimental design . Experimental and control groups 
should be formed by randomly assigning whole classrooms, so that 
treatment conditions will be isolated from the nontreatment group. Also 
there will be no need to pretest for matching, thus eliminating any 
"pretest sensitization." 

3. Specify the "treatment" . Any variable that confounds with 
reinforcement procedures needs to be eliminated. However, the students 
should have experience in earning the reward before data are collected. 
The subjects need to believe that reinforcement is coming and know how it 
feels to be rewarded for some performance. \ 

4. Use contingent, immediate reinforcement . Score improvement, not 

rank increase, should be reinforced to eliminate competition as an 

extraneous variable. The delivery of rewards based on the student's own 

score is more effective if reinforcement is given very soon after the 

test is taken. * ' 

Meta-Anal vsis of Research on Training 
Students in Test-Taking 

The' fact that it may be possible to raise students' test scores by' 
training student?, to take tests is important since test results are used 
as a basis for educational decisions. For example, the limited number of 
slots available in some special programs (e.g., special education and 
Title I) requires that students score below a certain test score 
criteria. Additionally, test scores are important to entrance and exit 
requirements for college, graduate schools, and vocat ' )nal institutions. 
Licenses for driving, specialized teaching, or practicing medicine and 
law are also awarded based on test scores. 
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Test taking is a critical survival skill in today's society. 
Whether this skill is learned by experience or through instruction is 
an issue currently facing educators. Due to the multiple choice, machine 
scorable answer formats of most cyoup standardized tests, unique skills 
are required of students who are expected to demonstrate mastery of the 
information contained in the test. Among these behaviors are the 
elimination of obvious distractors and systematic guessing: skills which 
are not necessary for answering the open-ended or single response 
questions most frequently used in instructional settings. 

This section reviews the research which has examined the effect on 
test scores of training students to take tests. Fir>t the test training 
components will be defined. Second, previous reviews on the test-taking 
literature will be examined. Next, two typical studies on training 

/ 

students will be described. The results of the meta-analysis' conducted 
on primary research in the area will then be presented. Finally, a 
summary and conclusions of the meta-analysis findings will be given. 

Definition of "Training" 

Three types of training have been investigated by researchers 
concerned with the degree to which test-taking skills contribute^to 
student/fest scores: practice, coaching, and training in test wiseness 
(TW). !fn this. review the term training refers to any prior exposure of 
the students to a testing situation including any combination of the 

three components. 

Practice. Test/retest experiences with identical, pdidlit' 
similar, or dissimilar forms have all been referred to as practice 
(Vernon, 1954). It is the lack of instructional feedback that 
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distinguishes practice from coaching or training in TW. Practice is a 
type of "training," because it is possible for students to "teach" 
themselves, or learn from prior experiences. 

Anastasi (1976) theorized that certain types of questions may be much 
easier to answer when encountered a second time. For example, some 
problems may require insightful solutions which can be reapplied in 
solving the same or similar problems on a retest. The individual who has 
extensive prior experience in taking tests may have an advantage in test 
performance over one who is taking the test for the first time (Heim & 
Wallace, 1949, 1950; Millman, Bishop, & Ebel, 1965; Rodger, 1936). 

Coaching . Prior to the 1950' s, the term "coaching" was used to 
describe the technique of telling students the right answers on a test 
and then giving them hints on how to improve their performance (Vernon, 
1954). The term became synonymous with "training in TW" as it was 
popularized in the 1950's. In this study, training in TW is broader in 
scope and is used to incorporate all aspects of coaching as well as some 
form of practice on item formats. 

Training in test-wi seness (TW) . In recent years, the rubric "test- 
wiseness" (TW) has been used to describe the variables used in constructing 
instructional programs to teach test-taking skills. Thorndike (1951) 
' first suggested that TW may influence the validity of a test. TW as a 
skill independent of content knowledge, has been defined by Millman, 
Bishop, and Ebel (1965) as "a subject's capacity to utilize the 
characteristics and formats of the test and/or the test taking situation 
to receive a high score" (p. 707). 

i 

! 
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The major skill divisions of TW have been outlined in a taxonomy .by 
Millman et al . (1965) and include strategies for time-use, error 
avoidance, guessing, deductive reasoning, intent (of test constructor) 
consideration, and cue-use. 

Rowley (1974) contends that the frequent use of multiple choice 
tests has precipitated the "test wise" students who receive higher scores 
than other students when both groups have the same knowledge. TW is not 
a general trait but appears to be "cue specific" (Diamond & Evans, 1972). 
For instance, TW students will use grammatical cues to "guess" the 
correct answer: a question with a plural verb form will be matched with 
an answer that has a plural verb rather than with an answer having a 
singular verb. 

Certain tests are more susceptible than others. For example, TW 
accounted for 25% of the variance in the vocabulary test scores of ninth 
grade students because of the use of cues in the 5 , items (Scheib, 1979). 

Novel situations, in particular, discriminate between the TW student 
and non-TW student (Ebel, 1976). Millman and Setijadi (1966) demonstrat- 
ed that students taking a test with a familiar format do better than 
students who were unfamiliar with the format. 

Experimental studies have shown that TW can be*, learned through 
specific training or through test-taking experience (Gibb, 1964; Moore, 
Schutz, & Baker, 1966; Slakter, Koehler, & Hampton, 1970). Crehan, 
Koehler, and Slakter (1974) found that without training, TW increases 

each year up to the ninth grade were statistically significant. When TW 

\ 

was examined over a four year period, it was found to be a stable 
characteristic fr^m junior high through graduation (Ciohai., Grnr% 
Koehler, & Slakter, 1977, 1978). _ 
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Tests to support the existence of TW have been developed by Ferrell 
(1972) and Woodley (1975). The correlations are statistically 
significant between measured TW and performance on achievement tests with 
multiple choice items (Alker, Carlsen, &Hermann, 1969; Ferrell, 1977; 
Rowley, 1974) and TW and GPA (Millikin, 1976), but are not statistically 
significant between TW and cognitive abilities (Diamond, Ayrer, Fishman, 
& Green, .1976). ' 

Ferrell (1977) argued that all students should have formal 
instruction in test taking to minimize the advantage test wise students 
have. Techniques for teaching TW have been developed by several 
investigators who have found that scores on TW scales consistently 
increase with training (Gibb, 1964). Evidence for increases in 
a'chievement test scores, however, is conflicting (Callenbach, 1973; 
Moore, Schultz & Baker, 1966; Oakland, 1972; Slakter, Koehler, & Hampton, 
1970). 

• Several commercial products specifically designed to train students, 
in TW have been marketed since 1978. Three of these training packages 
are Competency Tutoring Program (1979), Mini-Tests (1979), and Test_ 
Taking Skills Kit (1980). The information available from the publishers 
indicates that little empirical data have been collected to determine the 
effectiveness of the packages in teaching TW. The major problem with .the 
research that has been conducted is that the comparison groups 
systematically differed in factors other than treatment implementation. 
The control group in all studies was formed from schools -that did not 
"volunteer" to purchase the luts. Consequently, there may have been less 
'. .m for test-taking skills in the control schools than in the 
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treatment schools. Schools that purchased the kits, obtained 
statistically significant higher test scores than those that did 
not. 

Previous Reviews 

Five review articles were located that discussed the primary 
research on the effect of training students in test-taking skills (Fueyo, 
1977; MiTlman, Bishop, & Ebel, 1965; Roberts, 1979; Sarnaki, 1979; 
Vernon, 1954). A sixth review was located (Jensen, 1980) but not 
included because instead of discussing primary research, the author 
summarized other previous reviews. Table 1 contains a brief description 
of each review. The number of research studies reviewed in the five 
articles ranged from 8 to 20 with a mean of 14, Two articles listed one 
dissertation each in their references. 

The review articles illustrated the common faults that were 
described by Glass (1977): (a) haphazard literature searches, (b) 
outcomes net quantified .for comparisons across studies, and (c) the 
inappropriate use of statistical significance to integrate findings. No 
author (except possibly Vernon, 1954) reviewed all the literature in the 
field, yet the criteria for selecting articles or the. method of sampling 
were not reported. The use of only two dissertations suggests that at 
least one major source of research was not searched. 

Four reviewers did not quantify their findings by using a common 
metric to compare results across various research conditions. 
The statistical significance was reported only occasionally and 
unsystematically. ''However;* in no case was this information used to 
integrate similar conclusions or to compare findings. The reviewers 
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formed conclusions by summarizing the conclusions of the principal 
investigators rather than by quantifying and systematically analyzing 
study outcomes. Only one reviewer critically analyzed the primary 
studies for design and methological problems and recommended 
impr6vements. 

In the earliest work, Vernon (1954). prepared the best critical 
review of previous research on the effect of practice and coaching on 
intelligence test scores. It is unfortunate that this article was 
completed before the majority of the primary research in the area was 
undertaken (I960- to 1975). Vernon also reviewed the largest number of 
studies (i.e., 20), thus revealing that the other articles, printed 10 to 
20 years later, omitted relevant research. 

To facilitate comparisons of results across studies, Vernon 
translated the published data from the reviewed articles into standard 
scores (IQ). However, the translated scores were never integrated nor 
analyzed for covariance with study characteristics. The major criticisms 
made by Vernon on the primary work that he reviewed are listed below. 

1. The description of "treatment" did not distinguish between 
practice and coaching. 

2. Most studies- used pre/post designs; control groups were rarely \ 

used. " \ 

3. Researchers did not report if the treatment was conducted on 
identical, parallel, similar, or dissimilar forms. 

The other four reviews considered research on the effect of test 
wiseness (TW) on standardized tests (mostly achievement tests). The . 
authors of all five articles concurred that, in general, practice, 
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coaching, or training in TW will increase test scores. Other conclusions 
drawn by the reviewers are listed below: 

1. Training is not differentially influenced by age and sex. 

2. Retesting with the same form results in higher scores than if a 
parallel form is used. ; 

3. Certain subtest scores are more affected by training than other 
subtests ""(e.g. , larger increases were found on nonverbal and spacial test 
items than on verbal test items). 

4. Short practice exercises tfratT^imniediately precede the tests are 
not effective in increasing scores. 

5. The time between training and testing is critical (i.e., the 
longer the interval, the less increase in test scores). 

6. Increases in test scores due to training in TW fade more quickly 
than increases due to practice. 

7. Training in TW is more effective than practice alone. 

8. TW can *be acquired by students through multiple testing or 
taught by teachers who deliberately coach specific skills. 

9. Initially, TW accumulates rapidly but a definite ceiling 

exists . . 

These conclusions must be viewed with caution Vince the studies 
included in previous reviews were neither comprehensive nor m 
representative. Also, the findings from primary sources were summarized 
without systematically considering the impact of difference in study 
Characteristics. For instance, no analyses were performed on the effect 
on outcomes of number of subjects, type of test administered, age of 
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subjects, and quality of research design. Therefore, the reviewers' 
conclusions are based simply upon the original authors 1 conclusions. 

In the present study, the outcome data of all studies were converted 
to ESs which were analyzed for covariation with study characteristics. 
Therefore, the summary and conclusions will not be a tabulation of the 
primary investigators' opinions but will. result from a quantitative 
examination of how variables impact differently (across studies) on the 

outcome data. - # j 

i 

Typical Studies 

Two investigations are described to portray the most comnon 
. characteristics of the studies reviewed in this section. 

Practice . One group of students was administered one test on two 
different occasions, one week apart. The same form and level were used 
in both instances. The mean pretest score was compared with the mean 
posttest score and any increase was attributed to the effect of 
"practice." ; 

Training in TW . Students were randomly assigned to experimental and 
control groups. Both groups were given pre, and post tests. Between the 
tests, the experimental group was trained 1h skills that apply to taking 
exams: how to guess, fill in answer formats, eliminate distractors, and 
schedule time. The same test form or similar forms were administered to 
the students at pre and post test. The me^n control' group posttest. ^score 
was then compared with the mean treatment group posttest score to 
determine if training had a positive effect (increase) on test 
scores. / 

■ ' • I~ 67 ■ 
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Results of the Meta-Analysis 

Sixty-two effect sizes were generated from 37' research studies 
which examined the effectiveness of training students in TW or providing 
practice on taking tests. A summary listing of ESs by study is included 
in Appendix A. The studies included 34 articles published from 1924 
through 1979 and three dissertations completed from 1976 to 1977. 

Overall effects . Descriptive statistics were used to compare the 
results of training students or not training students to take 
standardized tests, A total of 62 effects was calculated to describe the 
impact of training students in test-taking skills on standardized test 
scores. Of the 20 practice ESs, 15 concluded that treatment increases 
test scores and 5 concluded that it does not. The investigators of the 
42 training in TW ESs concluded that the treatment worked in 31 cases, 
did not work in 5 cases, and was inconclusive in 6. Such rough tally 
seems to support the use of either practice or training in TW to obtain 
higher test scores t but a much more thorough analysis is possible. 

Across all 62 effect sizes, the mean effect size was^ .62 (median ES 
was .46) with a standard deviation of .68 and a standard error of _09. ■ 
This means that, on the average, trained students scored .62 standard 
deviation unit (or 23 percentile points) above the untrained students on. 
a standardized test. ' According to JDRP (1977), an ES of this magnitude 
constitutes a large gain. Cohen (1977) reports an effect size of .50 as 
medium -and .80 as large; Therefore, .62 is indicative of quite a... 
powerful impact. c-. 

While the majority of the ESs ranged from -.25 to .75, two thirds' 
< (40/6 2 )<- were over .25.- Nearly one third of the effects (18) ^reported a 

er!c 68 
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substantial ESof over .75 (see Figure 2). Although the distribution of 
ES is.positively skewed, the median (.46), mode (.50) and mean (.62) 
support the central tendency for trained students to score approximately 
one half standard deviation unit higher than the untrained students. In 
this case, the median may be a better- indicator of the overall ES because 
the nine ESs that are over 1.00 are all of low quality and they may 
inflate the mean (see Appendix A). 

Table 5 shows the average ES for each of the study characteristics 
coded in the meta-analysis. 

Quality and design . Low jgual ity studies accounted for 73% of the 
ESs. The most common problems associated with the lQW_guality studies 
were unspecified treatments, experimenter bias, and the use of pre/post 
designs. .Treatments of practice or training in TW were inadequately 
defined in 65% (40/62) of the, ESs. For example, it was often impossible 
to determine whether identical, similar, or different forms, of the test 
were used, h^vPlong the treatment lasted, or what training components 
were used. 

Examiner bias occurred when test administrators were aware of the ■ 
experimental conditions or when the same persons conducted the practice 
or training in TW and also administered the test (24% of the ESs). 

Studies using a pre/post design" (65%) resulted in a considerably 
higher ESs (.77) than those using experimental designs (ES =.35). Due 
to inherent design problems, all pre/post studies were coded low quality 
and accounted for 89% of the ESs in the poor category. The internal 
validity of studies coded "low" was. threatened because nohtreatment 
control -groups were not used. with the pre/post designs and extraneous 
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Table 5 



Categories for Describing Student Training Studies 


> 


imber of Studies 


in Each Category, 


and the 


Mean Effect 


Siz< 


• 




Number 


Mean 




Cha ra r+eristic 


Categories 


of effects 


ES 


SD 


Nuni)er of subjects 


9-49 


13 


1.15 1 


in 




50 - 99 


20 


.67 


.47 




TOO - 199 


9 


.35 


.32 




200 - 705 


13 


.37 


dfi 
. *fj 




Over 1000 


7 


. oc 


.24 


Age of subjects 


5 - 10 


8 


. by 


.49 




11 - 14 


22 


.4/ 


.56, 




15 - 18 


10 


.52 


.77 




19-24 


18 


. o/ 


.83 




25 - 40 


4 

/ 

c 


.44 


.44 


IQ of subjects 


65 - 89 


i on 


.79 




90 - 114 


37 


.47 


.42 




115 -120-^_ 


-An 

, ^ 23 


• /D 


.89 


Type of training 


Practice 


42 


. 1 c 


.69 




Test wiseness 


20 


.41 


:60 


Type of test 


Achievement 


30 


An 


.31 




iq y 


32 


.82 


.85 


Unit of 


Individual 


16 


1.12 


.63 


Administration 


Group . ^ 


\* 46 


.45 


. D I 


Design type 


True experimental 


17 


.36 


.33 




Quasi -experimental 


5 


.31 


.39 




Pre/post 


40 


.77 


.77 


Quality of Research 


High 


17 


.32 


.31 




Low 


45 


.73 


.67 


Concl usions 


Training worked 


46 


.78 


.72 




Training did not work 10 


.09 


.11 




Inconclusive 


6 


.24 


.08 




Overall 


62 


.62 


.68 



best copy mimi 
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variables of history, maturation, and testing were not controlled 
(Campbell & Stanley, 1963). 

When the poor quality studies (n = 45) were removed from the 
analysis, the mean ES became .32. True experiments accounted for 16 high 
quality ESs (ES =.34) and quasi -experiments for 1 ES (ES = .06). These 
data indicate that training is a powerful influence on test scores 
because even when only the best, most rigorous studies were considered, 
typical students will increase their scores from the 50th to the 63rd 
percentile after treatment. 

Type of training . Studies providing practice in test taking 
described larger effects than studies that trained students in TW (Table 
5). Some of the large impact of practice can be attributed to the 37 
pre/post designs used to investigate the effect of practice (ES = .76). 
Thus, quality of research design, rather than the type of training, may 
be responsible for the difference in ES. When only the high quality 
studies were considered, the effect of practice (ES = .32) was similar to 
the effect of training in TW (ES = .33) (see Table 6). 

Type of test and unit of analysis . As shown in Table 5, the 23 IQ 
tests: administered had a higher mean ES (.82) than the 30 achievement 
tests (ES = .40). For most categories, IQ tests achieved a higher ES 
than achievement tests. 

To investigate the factors that contributed to the larger effect 
sizes associated with IQ tests, ESs which resulted from studies with high 
and low quality research designs were examined separately. When the low 
quality ESs were removed, the effect of training on achievement tests . 
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Table 6- 

Mean ES by Quality for Type of Training 
and Number of Subjects 

Quality of research design 



Characteristic 


Category 


High 


Low 


Type of training 


■ Practice 


.33 (4) 


.76 (38) 




TW 


.32 (13; 


.57 (7) 


Number of 


9 - 49 


.66 {?.} 


1.24 (11) 


subjects 


50 - 99 


.25 (6) 


.84 (14) 




Over 99 


.29 (9) 


.42 (20) 



Note. Numbers in parentheses indicate the number 
of ESs. 
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(ES = .31) was very similar to the effect on IQ tests (ES = .37, see 
Table 7). 

Also, it is noteworthy that 84% (27/32) of the IQ test effects as 
opposed to 43% (13/30) of ' the achievement test effects, used a pre/post 
design. When ESs from only true experimental studies were compared, 
t!-" c -as little difference betwi an ESs obtained using IQ and achievement 

tt 

When poor quality aligns were eliminated from the analysis, only 
three IQ test effects remained "(ES = .37) and cautious interpretation is 
needed for so few ESs. In this group of high quality IQ test effects, 
practice had a lower ES (.28) than training in TW (ES = .40) , whereas the 
overall analysis (high and low quality) on practice was found more 
effective than TW. Therefore, with a mean ES of .37, typical students 
can increase their IQ scores by. 5.5 points (or 14 percentiles) with 
training. Only one of the three high quality IQ test effects 
administered the exam individually, resulting in a higher ES (.69) than 
group exams ( .20) . 

An examination of the high quality achievement test ESs in Tables- 
yields only two large differences among categories. Some variance from.: 
the mean achievement test ES of .40 can be attributed to the ES of .48 of 
the 16 low quality designs. As shown in Table 8, 15 out of 20 aptitude 
test studies accounted for 90% of the low quality practice and 100% of 
the lew qual ity TW effects. 

The five high quality aptitude test designs used 17 to 22 year old 
students and all the exams were group administered. A single high 

1 ity. aptitude study on the practice effect (ES = .83) was exemplary in 



Table 7 

Mean ES by Type of Test and Quality of Research Design 
for Type of Training, Unit, Age, Design, and IQ 



Characteristic 



Achievement tests 
High quality Low quality 



Intelligence tests 
High-quality Low quality 



Type of training 
Practice 
1M - 
Unit 

Individual 
Group 

Age 

5-9 
11 - 14 
15 - 18 
19 - 24 
30 -40 
Design 

True experimental 
Quasi -experimental 
Pre/post 

IQ 

65-89 
: 90-114 
115 - 120 
Type of test by quality 
•Type of Test 



.30 (11) 
.35 (3) 



.31 '14) 

.25 (3) 

.37 (3) 

.23 (6) 

.45 (2) 

1.09 (1) 

.33 (13) 

.06 (1) 



.42 (9) 
.11 (5) 
.31 (14) 



.62 (10) 
.28 (6) 

.78 (1) 
.46 (15) 



-39 (1) 

.03 C2) 

.65 (10) 

.22 (3) 

.78 (1) 

.03 (2) 

.52 (13) 



.18 (6) 
.65 (10) 
.48 (16) 



.28 (1) 
.42 (2) 

.69 (1) 
.21 (2) 



.40 (30) 



, .89 (25) 
.79 (4) 

1.13 (13) 
.67 (16) 

.95 (5) 

.21 (2) .49 (16) 

1.90 (2) 

.69 (1) 1.41 (6) 

.37 (3) 

.73 (2) 
.89 (27) 

1.90 (2) 
.37 (3) .64 (19) 

1.31 (8) 
.37 (3) .88 (29) 

.82 (32) 



Note. Numbers in parentheses indicate the number of ESs. 
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Table 8 

Mean ES for Studies Coded "Achievement" 
by Quality, Test, Age, and Training 



High quality 



Low qualtiy 



Aptitude Achievement Aptitude Achievement 



Age 6 - 7 Practice 

TO -25 (3) 

Age 13 - 24 Practice ' .84 (1) .10 (2) .64 (9) .39 (1) 

.TO .10 (4) .37 (3) .33 (3) 
Ags 30 - 40 Practice 

TO 1.09 (1) .22 (3) 

Note . Numbers in parenthese indicate the number of effect sizes. 
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that it was a true experimental design (a modified Solomon-%?Ltfr, jQanjpb,ey t* 

& Stanley, 1963). The traditional experimental and control .group, mode „ „ 

leaves unansv. .. this type of test/retest bece Lhe 

treatment 21 the posttest. To solve this dilemma, Lucas (l^-Jjtr^^wlyt 

assigned Australian high school students to three groups (pi-gte%t ;(r pnljr, , Ci 

posttest only, and pre/post test) and thereby controlled internal -threats 

to validity. Although the increase in test scores due to practice was 

substantial, the' measurement tool was a test of inference and the 

findings may not generalize to more typical American aptitude tests. 

The training in TW used in four high quality (experimental) aptitude 

test ESs had little impact, on the test scores. Although the difference 

between experimental and a control groups did favor j training , the impact 

(ES = .10) was too slight to conclude treatment effectiveness . 

« 

Nine ESs came from high qual ity studies using group achievement 
tests, two with practice and seven with training in TW. An ES of .10 
when practice on achievement tests was researched, means that typical - 
retest < ^ "'ould increase their percentile by 4 points, low and 

high achievers by 3 percentiles. With training in TW (ES = .42), the 
typical student increased scores 16 percentile points, low and high 
' achievers by 13 and 10 points, respectively. ... , , ... 

In summary, when considering the research from high quality designs,. 
trainingin TW appears slightly more effective in increasing IQ or 
achievement test scores than practice . However, practice r$t!pe£.. than, e 
trail, ?n b n ''' is more effective with aptitude tests. 0nly t one : high- 
quality effect (IQ test) was obtained from training in TW f^tr-an^ , ( , , L 
individual exam (ES = .69). . v ; ..vur.ui c>.l 
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Group achievement tests: subjects under 8 years old . Group 
achievement testing often requires student responses different from those 
required by other school v., . .... .u. / response ^re -ticu .arly 

difficult for students encountering group tests and machine-scoreable 
•formats for the first time. Therefore, a separate analysis was 
completed for the three experiments that- investigated the effect of 
training .in TW on primary grade students. 

Two effects 'were obtained from the same study (Oakland, 1972) and 
represent the gain of treatment students over the control group on post 
test scores. Two posttests were given, one six weeks after the pretest\ 

\ 

(ES = .36) .and the other six months af'er the pretest (ES = .15) Twelve^ 
30-minute training sessions were taught by teachers over six weeks to a 
random half of the students (control students received no training). 
Training consisted of general test-taking skills required for readiness 
tests, multiple choice formats, direction vocabulary, pagination, 
independent work, marking answers, and left to right movement. Since the 
students were prereaders, no cue-related strategies were taught. The 
emphasis of the training was to familiarize the examinees with directions 
"arvd answer formats for standardized tests. The classroom teachers 
administered the training and the tests, but they were not monitored ' 
during the training or the testing to ensure that the specified 
directions were followed. Consequently, teacher behavior during testing 
may have been a rival hypothesis if the test administration changed as a 
result of training the students. . . 

In a second study (Callenbach, 1973), training in TW was given to a 
random half, (n = 24) .of students matched on pretest scores from two 
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second grade classrooms. Statistical significance occurred (ES = .23) 
when comparing the standardized reading test scores of the treatment 
group with the control gso-up. Eight 30-minute lessons were taught in 
four weeks by the investigator who also administered the posttest during 
the week following the i, me ' fr '>e weeks after the pretest). 
Training consisted of following specific-directions and using unique 
formats as well as time-use and guessing strategies. The effect of 
experimenter bias was a potential extraneous vari#>le on this otherwise 
well-designed study. 

The results of the two studies suggest that a month of short 
training sessions in'TW will increase student tests scores over 
nontrained student scores. However, until these findings are confirmed 
by more research, cautious interpretation is required from only two 
results. 

The only major methodological problem in the two studies was the 
failure to control for examiner effect. For instance, Oakland (1972) had 
the classroom teachers both train for TW and administer tests. This 
procedure raises questions about the influence of extraneous variables. 
Did the student training indirectly train the teather more about test 
administration? That is, did the difference in&cores come from better 
test taking or better test administration? Did the teacher display 
behaviors during the test that were reminiscent of the training sessions, 
thereby prompting the treatment students? A'l'so, Callenbach (1973) 
trained and tested the students himself and may have biased the test 
administration in favor of the trained students. 
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In summary, 'several "good quality studies found that group 
achievement test scores were higher when students were given practice 
and/or training in TW. In good quality studies with teenagers, higher 
ESs were obtained by studies" using training in TW rather than practice 
alone. With primary students who are trained in TW, a 1/4 standard 
(deviation unit (10 percentile points) increase was found for typical 
students, 8 percentile points for low achievers, and 6 for high 
ac' ?ver . No studies haye isolated the effect of training in TW from 
test administration for young children. 

■ Other characteristics . The quality of research 'desi gn appears to be 
the most powerful differentiating variable among studies- on training. 
Most of the variations found in Table 5 can be accounted for. by the 
quality of the research study. For instance, there was only one 
substantial variation in ES as a result of the different ages of 
subjects. A large ES was obtained by the 19 to 24 year old group. Of ^ 
those 18 ESs, 15 were from pre/post designs that investigated the effect 
of practice (i.e., test/retest) on test scores and 17 used college 
students as the subjects (X IQ = 116). The fact that most of these 
ESs w'eVe, v from low qual ity. research using subjects with higher than 
average IQ scores suggests that age may not be as strong a determinant of 
outcome as indicated by the effect sizes in Table 6. In fact, the most 
reasonable conclusion from these data is that age is not an important 
covariate in interpreting the research in TW. • 

At first glance, studies with fewer than 100 subjects had 
considerably larger effect sizes than those with more than 100. / "However, 
further breakdown (see Table 7) indicates that the ESs as/ociated with 
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small studies may be confounded by the quality of research- When low 
quality studies were removed from. the ES computation, a reduction in 
strength occurred. The small number of available ESs from high quality 
studies with fewer than 50 subjects makes conclusions somewhat 
tentative. 

The highest ESs were obtained by the studies using low IQ/ students 
(see Table 5). However, two studies used a pre/post design and were of 
low quality. The difference between the ESs of medium and high IQ / 
students (after removing the low quality studies) indicates that scores 
from high IQ students (115 - 120) are less" affected by training than ■ 
scores from medium IQ students (see Table 6). 



On the average, training students in test-taking skill* increases 
test scores .62 standard deviation unit. In previously conducted 
research, the impact of training on test scores was demonstrated by 
differences in the percentile points obtained by trained and untrained 
students: 73 to^50 for typical students, 41 to*20 for low achievers, and 
92 to 80 for high achievers. i 



When the analyses were limited to ESs from only high quality research 
designs, the resulting ES was .32 (or an increase of 13 percentiles for 

typical students) y 

A further breakdown of the 62 ESs showed that training in TW w ^s 

more effective than practice in increasing IQ and achievement test + 

scores. For aptitude tests the reverse was true: practice was more 



Summary 




effective than training in TW. The effect of training is similar on IQ 
and achievement tests. Small scale studies produced larger ESs than 
large scale st -lies, scores on individual IQ tests were affected more by 
training than group IQ tests, and higher test score increases resulted ~ 
when the training materials more closely resembled the actual tests. 
Intensive training, close in time to the- test, resulted in the highest 
score increases. 

Two .major methodological problems, other than use of pre/post 
designs, were identified: (a) many interventions were not adequately 
described and (b) examiner bias may have resulted from having the same 
person train and test. 

Conclusions* \ 



The data provide evidence that educationally significant increases 
in student test scores can be obtained through practice or training in 
TW. The impact of training may make a considerable difference ..to 
individuals at the borderline of selection for speciaiT programs . 
Therefore, it is critical to understand the impapt of various practice 
and training strategies on student test scores,/ 

Currently, no large scale (over 100 subjefcts) , high quality 
experimental studies have been conducted redetermine the effect of 
training' primary students in TW skills. Two small experimental studies 
which trained young students in TW reported an increase in group 
achievement test scores. However, external validity was threatened by 
the small number. of subjects (less than 50 subjects; population validity) 
and the fact that the same person administered the training and test 
(examiner bias) . 
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It is recommended that further research j)e conducted which adheres 
to the following conditions: • 

; 1) Conduct a* large scale, study . Students from several schools 
will increase the population validity. ' 

2) Define treatment . Describe the type and amount of practice, 
TW components, length, and forms used, so that replication a^d 
secondary analyses can be conducted. 

3) Use true experimental design . By randomly assigning students 

to treatment and control groups, the internal threats to validity will be 
reduced. *> \ 



Review of Research Related to 
Training Teachers in Test Administration 



A number of researchers have suggested that the test administrator 
can influence the outcome df an examination through the type of behavior 
he dr she exhibits during testing. For instance, scores can be affected 

-if An examiner does not follow the directions correctly. Also, negative 

V • 

attitudes can be subtly communicated to the students who may then perform 

V j 

in a less rigorous fashion (Messick & Anderson, 1970). If an examiner 
views the test as an imposition, an unstandardi zed testing situation may 
result since time limits may not be followed, clues or assistance, may be 
given to students, or directions may not be given completely. . 
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Aside from a few conceptual articles, no studies were located that 

i " \ 

directly addressed the effects on student test performance of training 

-■ i \ 

teachers to administer group standardized \tests. Since no empirically 
based research was located, the studies reviewed in this section are 
those that are related to test administration. The studies 
provide background on training test: admin i strators by demonstrating the 
effect of testing factors that are typically controlled by the examiner. 
The reviewed articles were iocated (through the computer search that was 
previously described, but did not meet the criteria for a meta-analysis 
used in Reinforcement or Training. 1 

Included in this review are studies that show the impact of 
manipulating various testing conditions surrounding student test scores 

or test behaviors. The testing conditions chosen for review are those 

i 

that can bel and frequently are, controlled by the test examiner. The 

r \ 

major categories of testing conditions that are controlled by the test 

administrator and that may vary within the realm of standardized 

procedures are student "test anxiety level, the examiner/examinee 

relationship, the degree of test information given to* students, the 

mechanics of taking tests, and environmental factors. j Excluded from this 

review are studies which examine reinforcement or stuient training, which 

were reviewed in the previous two sections, and those 

i i 
analysis of the testing instrument. j 



that focus on 



Previous Reviews 

Due to the paucity of research on test administration training, no 
previous review articles were located on the subject. jHowever, two 
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previous reviews did report on studies that investigated the effect of 
manipulating variables associated with testing. In one review, Sattler 
and Theye (1967) discussed the results of 56 research articles on factors 
affecting individually administered IQ test performance: departures from 
standard procedures, situational variables, experimenter variables, and 
subject variables. To summarize the findings, the reviewers reported the 
number of^stati st ical ly significant results. 

Although this review did not systematically analyze the articles for 
methodological problems, the authors stated that the most common design 
deficiency was the failure to use a random sample of experimenters'. Four 
major conclusions were drawn: 

1. Minor procedural changes are more likely to ^ffect specialized 
groups than normal groups. 

2. Children are more susceptible than college-age subjects to 
situational variables, especially discouragement. 

3. The examinees level of experience is not a. crucial variable. 

4. The subject's anxiety level is related to test performance. 
These conclusions on individual -testing have limited general izabi 1 ity to 
group testing because the administration is different. For example, to 
test individuals , the examiners must often make subjective judgments in 

'recording answers and scoring forms. On the other hand, group testing 
requires skills in maintaining control and motivating a large number of 
people to act as a unit. Therefore, examiner behavior will impact 
differently on individual testing than on group testing. 

A second review discussed research concerned with the effect of 
testing on the students. Kirkland (1971 ) reviewed 44 studies that 
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examined the degree to which situational variables impact on test scores. 
Studies were individually summarized in terms of increasing test scores 
but statistical significance was not reported. No conclusions were drawn 
and only studies that resulted in higher scores for the treatment group 
were reported in the review. A critical analysis of methodology and 
design was not conducted. 

There is no' indication that either Sattler and Theye (1967) or 
Kirkland (1971) reported on all the primary studies in the field or used 
appropriate sampling techniques to ensure representativeness. In 
describing the state of the research, both reviews restated the 
conclusions drawn from the primary investigators and made no attempt to 
quantify the outcome measure by converting it to a common index. 
Therefore, comparisons cannot be made across studies to determine the., 
relative impact of the treatment. Additionally, neither review discussed 
the covariation of different study variables on the outcome. , 

Studies from the references of the two previous reviews were 
combined with those locatedTTforlng the computer searches Lu pruv i de-some — 
background information on the training of test administrators. ( Since 1 the 1 
studies represent a conglomeration of varied treatments, the research has 
not been integrated nor compared in the present review. Instead, this 
review merely describes the trends in previous research which support the 
use of various examiner training components. Unless otherwise stated, 
all of the studies reviewed found statistically significant differences 
in favor of the intervention. 
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Test Anxiety 

The study of test anxiety began in 1952 with Mandler and 
S. Sarason's investigations into the correlations of high anxiety during 
examinations. The high test-anxious person attends more to self -relevant 
factors (e.g., thi. consequences of failing the examination) than to 
task-relevant factors (e.g., the elimination of obvious distractors on a 
multiple choice test, before guessing) and as a result is unable to 
demonstrate the extent of his or her skills or knowledge (I. Sarason, 
1978; Wine, 1971). 

Since a constellation of behaviors comprise test anxiety, it is 
difficult to document the complex condition with a single observational 
measure. From necessity, descriptive data on test anxiety are derived 
from the use of self-reports as well as simultaneous measures taken with 
other instruments. Self-reports consist of students responding to a 
single question or to a set of many "questions regarding their feelings^ 

about test taking. 

Using a single response item, Baird (1977) polled 4,248 college 
students after taking the GRE, LSAT, V MCAT, and found that 50% said • 
they had been nervous while taking the test. Multiple response measures 
used to provide evidence of test anxiety are often screening devices such 
as the Test Anxiety Scale for Children (S. Sarason, Davidson, Lighthall, 
Waite, & Ruebush, 1960), Def ensi veness Scale for Children (Ruebush, 
I960), Inventory of Test Anxiety (Osterhouse, 1972), and Test Anxiety 
Scale (I. Sarason, 1978). 

In a typical study designed to document the effect of test anxiety, 
high anxious (HA) and low anxious (LA) students are identified by using a 
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particular screening measure. The validity of the high and low anxiety 
classification scheme is established by comparing the screening results 
with correlations of Student performance on other measures. 

To illustrate, in a study by Kill and Eaton (1977), the behavior of 
prescreened HA and LA middle-school students was observed while they 
worked on addition problems under time ajid failure pressures. HA 
students were found-intake twice as long per problem, make three times 
as many errors, and cheat twice as often as LA students. However a when 
HA students in a related study operated under success conditions with no. 
time limit, solutions were accurate and the pace was more rapid (Hill, 
1967). 

Students 1 scores on test anxiety scales have been correlated with 
scores on academic measures such as intelligence, academic, and 
diagnostic tests. For example, Kestenbaum and Weiner (1970) found that , 
reading performance positively correlated with scores on achievement 
motivation measures, but negatively correlated with measures of test 
anxiety. Steininger, Johnson, and Kirts (1964) have linked high test 
anxiety with cheating. Data from college students questioned on 
attitudes about cheating revealed - that students tend to feel that 
cheating is justified' when ■ situations are anxiety or hostility provoking. 
Steininger et al. concluded that tests viewed as senseless (without 
purpose) tended to evoke hostile, anxious feelings. 

Based on the reviewed results of the studies, it appears that 
certain students are provoked into anxious feelings when presented with 
an examination. The extent of debilitation that test anxiety places on 
student test performance and methods for controlling anxiety -are examined 
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in the next two sections. Specifically, two questions are addressed: 
(a) Does test anxiety influence test scores? (b) Can test anxiety be 
controlled? 

Does test anxiety influence test scores?- The relationship between 
anxiety level and test performance has been investigated from two 
perspectives: (a) students 1 self-perceptions of their emotional state, 
and (b) observations of student behavior. Outcomes on these two measures 
are frequently confounded by subject selection and classification, type 
of treatment, and type of dependent measures.' Studies that focus on the 
anxiety/test score relationship generally rely on self-report measures 
for classifying students as HA or LA and use an objective .academic test 
as a correlate. Many researchers have found test scores to be negatively 
correlated with anxiety level (Alpert & Hatter, I960: Butler, 1980; 
I. Sarason, 1957; I. Sarasbn, 1963). In studies using factorial designs, 
research has repeatedly demonstrated that highly anxious students at all 
grade levels receive significantly lower test scores than low-anxiety 
students (McCandless & Castaneda, 1956; McCoy, 1965; Zigler, Abelson, & 
Seitz, 1973). 

Paul and Erikson (1964) analyzed an anxiety/test score paradigm and 
found an interaction between anxiety and test scores. That is, a certain 
amount of anxiety is generally beneficial to test performance while a 
large amount is detrimental. When classified by anxiety level, 
individuals who were usually LA benefited from test conditions that 
aroused some anxiety, while those who were HA performed better under more 
relaxed conditions. 
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Support for anxiety as one determinant of test scores was demon- 
strated by Hill (1967) who examined the effects of social reinforcement 
given to 7-year-old students for marble sorting. The highest performance 
was obtained after success for reinforced LA students and. after failure 
for reinforced HA students. Therefore, the use of reinforcement may hav6 A 
a differential effect on test results according to the degree of anxiety 
and attitude towards the test. 

The negative aspects of high-test anxiety that result in low scores 
have been attributed to students failing to attend to relevant tasks, 
thinking irrelevant thoughts, and arousing emotions that interfere with 
performance (Alpert & Haber, 1960; Mandler & S. Sarason, 1952; Paul &. 
Viksen,. 1964; I. Sarason, 1962). 

Marlett and Watson (1968) reported that HA students spend part of 
testing time worrying about their performance or how others are doing, 
and often repeat solutions to problems. Other research has demonstrated 
that test-anxious students who are highly debi 1 itative, exhibit high- 
pretest anxiety, poor attention, fixation on mistakes, self-criticism 
during testing, low academic self-perception, and no use of mental 
imagery during examinations (Couch', Turner, & Garber, 1979; Doffenbacher , 
1978). 

Nunn (1976) found a strong tendency for HA students to assign 
personal control to others .rather than to themselves and as a result, 
fail to try to get high scores. Downey (1977) found that an "I can beat 
the test" attitude accounted for higher scores among students at similar 
skill levels. 
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"The effect of previous failure and success appears to be an 
important explanatory variable within the HA/LA structure. In studying 
the academic performance of high school students, Osier (1954) observed 
that continual failure depressed pupil performance during examinations. 
Lazarus and Eriksen (1952) found that successful college students with 
high grade point averages (GPA) tend to have a better test performance 
under stress. Those with a low GPA had lower test .scores under stress. 

In a review of research on the relationship between test anxiety 
and test performance, Hill and S. Sarason (1966) concluded that highly 
structured testing procedures systematically underestimate the abilities 
and achievement of many anxious children with histories of failure in 
school. Even when failure has not occurred, but is a strong potential, 
the HA student will often falter on easy tasks v (Eaton, 1979). 

Can anxiety be controlled? While considerable attention has been 
given to determining' the best strategy to use in reducing anxiety, most 
Studies have demonstrated the effectiveness of treatment by using 
anxiety scales both as screening devices and as the dependent variable 
rather than using test scores as the outcome measure (Parker, 1980). * 

In typical studies, a self-report measure of test anxiety is 
administered to students before and after the implementation of a 
treatment designed to alleviate the debilitating emotional arousal 
brought on by an impending test. A treatment of desensitization or 
relaxation techniques is applied and the before and. after self-report 
scores of treated students are compared with the scores from the control 
•group to provide effectiveness evidence. 

A study by Lent and Russell (1978) typifies the research on 
programs that are designed to reduce test anxiety. Prior to and 
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following a desensitization and study skills treatment, self-report 
instruments and a simulated examination were administered to anxious 
college students. Students in the treatment group demonstrated 
significant improvement over students in the no treatment group on all , 
self-report measures, but there were no differences on the academic 
tests. One explanation may be that the test (anagrams and digit 
symbols) may not have been sensitive to changes in anxiety levels. 
However,. this theory is partially refuted by results from I. Sarason 
(1973) who found that LA college students perform at a higher level in 
solving anagrams than HA students. In considering the findings of Lent 
and Russell and I. Sarason, it appears as though treatment may reduce 
students' perception of their anxiety but does not influence the anxiety 
level itself nor the effect of high anxiety (i.e., low test scores). 

In investigating various methods for alleviating test anxiety, 
researchers who have used scores as a dependent variable have found no 
statistically significant increase in test scores (Arnold, 1979; 
Friedman, 1979) even- when GPA (Holroyd, 1976) or test taking-skills 
(Meichenbaum, 1972) have improved. 

It is important to note that treatments reviewed in this section 
involve attacking the anxiety but not necessarily the cause of anxiety. 
For instance, if students are, anxious because the test. format is 
unfamiliar and relaxation techniques are provided, the test scores may 
not rise, but the students may be more at ease. 

In a recent review of research on test anxiety, Tryon (1980) 
concluded from 85 s-tudies that all treatments which reduce test anxiety 
are effective according to self-report instruments. However, there are 



conflicting .results when treatments are measured by academic performance 
on objective tests. The most successful strategies have been those 
directed toward' the elimination of worry through desensitization while 
providing study skills counseling. 

Tryon (1980) located five studies using achievement tests as 
outcome measures, but the treatment group differed significantly in the 
outcome measure from the nontreatment group for only two of the studies. 
Four out of 12. studies found the treatment effective in reducing 
intelli gence test anxiety. 

\ 

Research design flaws may account for some of th\yariation in the 

findings of differfant researchers using academic tests as an outcome £ 

s 

measure. Both Allen (1972) and Tryon (1980) reported that the quality 
of research design appears to be negatively correlated with treatment 
effect. The mbst common design problems found in research on test 
anxiety were the lack of credible, random placebo and control groups, . 
therapist effects, the use of volunteers, and ill defined, complex, 
confounding treatments; 

Summary . There is evidence that anxiety is associated with lower 
test spores. Since high anxiety students tend to have lower test scores 
than low anxfety students, achievement test results of HA students may 
be invalid indicators of academic skills or an underestimation of 
knowledge. . 

Studies which investigate ways of decreasing anxiety (and thus 
reducing the effect of. extraneous" variables that may confound test 
interpretations) have usually used self-report measures to demonstrate 
treatment effectiveness or have found statistically insignificant 
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differences between treatment and control groups on academic test 
scores. 

Several explanations can be offered for these findings. First, 
treatments currently used may not be effective in controlling anxiety; 
they may affect only the subjects' perception of anxiety. Second, 
anxiety may be reduced b:,c students may continue to display poor test- 
taking skills. 'Third, a treatment may be so closely tied to the outcome 
measures that although the anxiety level is not reduced, subjects become 
aware of the "correct" response to make on self . reports during the 
second administration. Fourth, the academic measures used in some 
studies are very short (one or two. subtests) and the skill range may be 
to small to detect score differences due to lower anxiety levels. 
Finally, it may be that current treatments td-lwer^st anxiety do not 
raise test scores because the. underlying causes are not treated. 
Anxiety may result from unfamiliar test formats, strange examiners, 
previous failures,, Tack of test-taking skills, or a general 
misunderstanding of test directions. / / 

Because the relationship- between high anxiety and low/test scores - 
has been documented, further research is warranted to determine how to 
obtain measures of student 1 achievement without the influence of anxiety. 
In this regard, it behooves test administrators to somehow reduce 
anxiety levels if students are to obtain valid, interpretable test 
scores. Though only indirectly associated with anxiety, some techniques 
have been demonstrated to be effective in raising test scores. The next 
sections will describe procedures that should be considered by test 
administrators to obtain more accurate test results. Many of these 
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techniques may be applied within standardized conditions. Tfte fact that 

these examiner behaviors are not specified in test manuals, ^encourages 

uncontrolled variation in test scores that is not attributable to 

/ 

differences in academic skills. 

Examiner/Examinee Relationships ' ' .. • 

The importance of providing positive testing experiences isi 
demonstrated in the 1 i terature, by the low test scores -which result when 
examiners who are strange, unfamiliar, negative, or punishing are used. 
Test manuals usually recommend that examiners establish rapport with 
students before testing, but rarely specify the procedures for 
establishing such a relationship. In recent ^years, investigators have 
examined the impact of examiner characteristics on test scores. For 
example, Masling (I960) found that test results varied systematically as 

a function of the examiner/fcxaminee relationship, these differences may 

i 

be related to the personal characteristics of the examiner such as sex, 
race, personality, or appearance (Stoneman & Gibson, 1978). 

jGender of test administrator . • Sor.ie researchers have shown that 

i " 
examiner gender influences test scores (Cieutat & Fl ick, 1967).. One 

hypothesis is that elementary students are more familiar with female 

teachers than male and this may encourage higher test scores under 

female test administrators. In testing this theory, Back (1979) found 

in two related studies that the statistically significant high WISC . 

scores obtained by female examiners over male examiners was reduced to 

ignificance when male examiners spent 15 minutes with the, children - 
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prior} to testing. 



Race of test administrato r. Although studies examining the effect 
of the examiner's race on test performance have produced conflicting 
evidence., the statistically significant effects found with so^e 
demonstrate _that race is a potentially confounding variable (Katz, 
Henchy, & Allen, 1968; Katz, Roberts, & Robinson, 1965; Thomas, Hertzig, 
Dryman, & Fernandez*; 1971). In a recent^ review of 16 well-designed 
studies on race of the test administrator, Jensen (1980) found a 
statistical ly significant interaction (race of teacher X race of 
student) in only six studies. Because of the inconsistency in favoring 
same and different race, Jensen concluded that race is not a source of 
test score variance. 

Poise of test administrator ; Even more subtle factors may 

influence student performance. For instance, in giving .instructions or 

' ~ ~" i 

oral problems, teachers may encourage or discourage students by the rate 

of speaking, tone of voice, . inflection, pauses, and facial expressions 
(Anastasi,1976; Wickes, 1956). The examiner's behavior before and 
during the test administration has also been shown to affect test 
results. For instance, by displaying an expectation that students will- 
perform wel 1 , examiners may create a self-fulfilling prophecy (Exner,-^ 
1966). 

-— As early as 1949, Thorndike emphasized the importance of "presence" 

in a test administrator. This attribute includes assurance, poise, 
dominance, and a good speaking voice. To obtain and maintain control of 
the testing situation, Thorndike insisted that a teacher be thoroughly 
familiar with instructions, conscientiously follow, the directions, know 
the principles and purposes of testing, and exercise good judgement 
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approval group performed significantly higher than those in the 
disapproval group. 

In a study comparing the IQ scores of students who were tested by 
examiners using standardized conditions, Thomas, Hertzig, Dryman, and 
Fernandez (1971) found that the nature of the examiner significantly 
influenced test results. Scores were higher when testedby a warm, 
friendly, encouraging examiner (who also spent more time with the 
- students before normal testing) than with examiners who made no effort 
to create a positive environment. 

While most studies found higher test scores associated with warm 
and positive test administrators,. Coleman (1978) demonstrated * that some 
types of personal interactions with teachers may be distracting to 
students during testing. Sixth grade students experiencing a cold, 
task-centered examiner>*did significantly better on group administered 
intelligence tests than students who experienced a warm, child-centered 
examiner. 

The type of rapport existing between examiners arid students prior 
to testing also influences test results. Emotionally or physically 
disturbing the examinees immediately preceding an examination 
significantly reduces' test scores (McCarthy, 1944; Reichenberg-Hackett , 
1953). Based on the premise that testing maximizes anxiety in children, 
Piersq,!, Brody, and Kratochwill (1977) found that exposing students to 
an affectively warm and rewarding pretest experience resulted in 
improved test scores and reduced apprehension levels. 

Familiarity of test administrator with students . The effect of 
familiarity was examined in an early study by Sacks (1952). Ten-year-o\d 
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students were randomly, assigned to three test administrators, two . 
famili and one unfamiliar. One of the familiar test administrators 
had established a poor relationship with the students, the other a good 
relationship. Statistically significant differences indicated that 
students with fami 1 iar positive test administrators obtained higher 
scores than students with familiar negative examiners, who. do better 
than students with unf ami 1 i ar examiners. 

Negative prior testing experience . The effect of negative past 
testing experiences was investigated by Davis, Peacock, Fitzpatrick, and 
Mulhern (1969). Math test scores from two groups of, college males were 
compared to determine the effect of prior failure. Those students who 
had failed on a previous test and had received negative feedback from 
the examiner performed significantly lower on the math test than 
students without such experiences. 

Information About the Test r 4 

The degree to which examinees should be informed about .the testing 
situation (e.g., type of test, type of test format, content, use of test 
results, scoring protocols, length of test, difficulty) has been debated 
for several decades. One perspective emphasizes the danger of 
instilling too much anxiety by over-emphasizing the importance of test 
score's in a student's future endeavors. On the other side, examinees 
may not try to do their best if they have not been properly informed 
about the test. 

Advance notification . Although the effects of giving standardized 
tests without ' some sort of previous announcement has not been 
investigated, there is some evidence that students obtain higher scores 



on teacher-prepared tests if an upcoming test is announced than if it is 
not announced (Pease, 1930). Tyler and Chalmers (1943) found that the 
average scores of junior high students increased substantially by 
providing a specific notification 'that a weekly test would be given. 

"Game " vs. test . The way in which tests are referred to has also 
been shown to affect student test behavipr. For example, when third 
grade students in one study were told that they were to play a game , the 
experimental group had significantly higher IQ .scores than the control 
group who were told that they would be given a test (Strang, Bridgeman, 
& Carrico, 1974). However, Orfanos (1979) found no significant 
difference between students taking a test or playing a game . It was 
concluded that the subjects, fourth and seventh grade students, . were too 
aware of the nature of the test to be- fooled into,, "playing a game. 1 ' 

How an examiner introduces a test may also differentially influence 
test scores depending on the students' emotional states at thfe time of 
the examination. For example, Sarason and Palola (1960) found that 
highly anxious college students who were told that the results of an 
achievement test would reflect their intelligence and predict their 
success in later life received lower test scores than students who were 
told nothing. There was no difference in the scores of low anxious . 
students. 

Knowledge of items-difficulty level . Information given to students 
prior to the test about the difficulty level can assist test wise 
students in organizing their time for a speed test. If easy and 
difficult items are randomly placed throughout the test, a good test 
taker will answer all easy questions first, skipping the unknown items 
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for later consideration. If the questions are arranged sequentially, 
easy to difficult, the test-wise student" will proceed through the test 
item by item. Kubiszyn (1979) investigated the effect of listing the 
difficulty level of the questions next to each item. He found that 
test-anxious students receive higher scores when they know the 
difficulty level than when they do not. m It was concluded that anxious 
students are more relaxed in answering questions that are indicated as 
being "easy." In addition, in a 1978 article, Huck hypothesized that 
higher scores will result when students are told that an item is 



"difficult" because they will read more carefully than they will if an 
item is "easy." 

Feedback on test performance . The effect of providing students 
with feedback on how well they are performing on, tests has been disputed 
among researchers. In one study, giving studenLs item by item feedback 
on test performance depressed the IQ scores (Piersel, Brody, & 
Kratochwill, 1977). On the other hand, Benson (1980) found that low 
ability ninth grade students who were told the correct response aftec 
each trial obtained significantly higher scores on a verbal ability test 
than those receiving no feedback. 

Variation in the method of dispensing feedback (i .e. , positive or 
negative) could -account for the- difference in results of the two studies 
cited above. A study, by Bridgeman (1974) illustrated how certain 
feedback may act as a motivational variable to influence performance and 
create a "self-fulfilling prophecy." Three groups of seventh grade 



students were given success feedback, failure feedback, or no feedback 
after taking a scholastic aptitude, test . Students given success 
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for later consideration. If the questions are arranged sequentially, 
easy to difficult, the test-wise student w.ill proceed through the test 
item by item. Kubiszyn (1979)' investigated the effect of -listing the 
difficulty level of .the questions next to each item. He found that 
test-anxious students receive higher scores when they know the 
difficulty level than when they do not, .It was concluded that anxious 

students are more relaxed in answering questions that are indicated as 
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being "easy." In addition, in a 1978 article, Huck hypothesized that 
higher 'scores w.ill result when students are told that an item is 
"difficult" because they will read more carefully than they will if an 
item is "easy." 

Feedback on test' performance . The effect of providing students 
with feedback on how well they are performing on tests has been disputed 
among researchers. In one study, giving studenLs item by item feedback 
on test performance depressed the IQ scores (Piersel, Brody,, & 
Kratochwill, 1977). On the other hand, Benson (1980) found that low 
ability ninth grade students who werci told the correct response after-- 
each trial obtained significantly higher scores on a verbal ability test 
than those receiving no feedback. 

• Variation in the'method of dispensing feedback (i.e., positive or 
negative) could account for the difference in results of the two studies 
cited above. A study by Bridgeman (1974) "illustrated how certain 
feedback may act as a motivational variable to influence performance and 
create a "self-fulfilling prophecy." Three groups of seventh grade 
students were given success feedback, failure feedback, or no feedback 
after taking a-scholastic aptitude test. Students given success 
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feedback scored statistically significantly higher in subsequent testing 
than those given failure feedback. 

In looking at the emotional impact of testing, Shannon (1978) 
examined the effect of withholding feedback from students. Findings 
from this investigation showed that ter.fcti grade students who received 
pretest counseling or posttest score interpretation maintained the same 
attitude toward the subject content, whether positive or negative. 
However, students who received no. feedback on test resul ts,, had 
significantly more negative feelings toward the subject than the control 
group. * 

Summary . Previous research has demonstrated that the type of 
information given- to students about their examinations influences test 
scores and attitudes. Although further investigations are warranted to 
determine the extent of the impact, test administrators must be informed 
that scores cari vary as a result of sharing various types of 
information. Often the test directions do not specify how to provide 
feedback, but previous research suggests that at the very least, 
students will receive higher scores if they are told of an impending 
te£t, are shown the results after scoring, and are informed abou 4 * the 
basic test structure. 

Mechanics of Test Taking 

As early as 1949, Thorndike wrote that students exhibited different 

levels of understanding about the mechanics of test taking. Studies 

have shown that many students not only fail to comprehend the specified 

directions provided with standardized tests but also cannot make wise 
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choices in guessing or eliminating answers (Anastasi, 1976). Part of an 
examiner's role i.s to assure that students understand the techniques for 
taking the particular exam being administered. . 

Use of separate answer sheets . Most standardized achievement tests 
which require specialized directions use machine scoreable answer forms. 
Since these forms are unlike formats of daily work encountered by 
students, elementary pupils are often unaware of the proper method of 
filling in answers for multiple choice items. In addition, Traxler 
(1963) found that the mean test scores from forms marked sloppily were 
significantly lower than scores on well marked answer sheets. 

The use of answer sheets that are separated from the test booklet 
.can be difficult for elementary students because they make mistakes as 
they transfer from 'question' to answer space (e.g., marking on the wrong 
answer line or wrong answer space) (Bell, Hoff, & Hoyt, 1964; Cashen & 
Ramseyer, 1969). In one ^study, students in grades one to three who 
recorded scores on separate answer sheets received Significantly lower 

i- ■ 

scores than students who recorded answers in the test booklets. Even 
with practice, scores were lower when students used a separate answer 
sheet than when they answered questions in the test booklet (Ramseyer & 
Cashen, 1971). Similar results were found by Gaffney and Maguire (1971) 
that separate answer sheets from students in grades two and three were 
filled in improperly regardless of the directions given to the 
students. 

Guessing and systematic elimination . Since most students complete 
school work on a cri terior-referenced basis, they are= not experienced in 
dealing with a situation where many answers to questions .are unknown. 
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Therefore, it is rare that students in grades one through three, in 
particular, will know how to eliminate answers, guess, change answers, 
or check work (Erickson, 1972; Traxler, 1963). Most test manuals do not 
provide directions for the test administrator in teaching students to 
guess or check work. 

Several researchers have found that. guessing will raise scores 
regardless of the mathematical correction used in scoring (Hammerton, 
1965; Sheriffs & Bommer, 1954; Slakter, 1968). Taylor (1966) and 
Moore, ^Schutz, and Baker (1966)' studied the impact of using different 
instructions to either encourage or discourage guessing and found^more 
omitted and unfinished items when students were told not to guess. In 
another study, Aiken and Williams (1978) investigated the effect of 
instructing students to guess and found that formulas used to "correct" 
test scores for guessing affect students with poor knowledge of subject 
matter more than those with high knowledge. 

Checking work . In a related area, students frequently ask if they 
should change answers after reconsidering the question. Most 
researchers concluded that students who change answers tend to get 
higher scores (Berrien, 1939; Lynch & Smith, 1972; Mercer, 1979; Reile, 
& Briggs, 1952). Bath (1967) calculated that when ^ response is changed 
there is a three to one chance that the new response will improve rather 
than lower the final score. In an early report by Lowe and Crawford 
(1929), 21,903 true-false test items. were analyzed and they found that 
correct changes were made almost twice as frequently as incorrect 
changes. Similarly, Matthews (1929) examined 22,000 multiple c:\oice 
items on a college level test in which 555 changes in answers had been 
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made. Of those changes, 52% raised the score, 21% lowered it, and 26% 
had no effect. On another test, 18,000 true-false items were studied 
and of the 570 changes, 63% raised the score, 34% lowered it, and 3% had 
no effect. The results of a breakdown by "superior" and "inferior" 
students showed that although "inferior" stMdents made more changes, 
only 4y% of their changes raised the score whereas 68% of the changes 

made by "superior" students raised the score. 

.. . 

This work was preceded by Lehman i'n 1928, who examined the results 
of high school students changing answers on a true-false test. He 
concluded that high scoring students tend to make fewer, but more 
correct changes, than low scoring students. Conversely, .poor students 
often make wrong initial decisions as well as incorrect revisions. 
Although further research has hot been undertaken to examine possible 
causes, Lehman suggests that low performing students may not know how to 
evaluate their own work. 

Problem attack strategies . The procedures students use tQ answer , 
questions have been shown to affect test scores. For instance; in two 
early studies (Holmes, 1931; Washburne, 1929), students who read the 
comprehension questions before reading the selection received higher^ 
scores than students who read^the selection first. 

In 1933, Weidemann and Newens investigated the effect of different 
instructions for answering true or false questions. -Students were told 

CI 

to use a specific reasoning pattern -to decide if the answer was true or 
false. Test scores were found to ve^y according to, the reasoning 
pattern given for deciding how to answer. 
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Summary . Since = a meta-analysis of the literature on'training 
students with. a package of -test-taking skills appears in an earlier ' 
section of this chapter, only studies that examined the use of a 
singular TW strategy (e.g., guessing) have been discussed here. As 
indicated by some studies, students who guess, "Change answers, and use 
their time wisely, tend to get higher scores. The test administrator 
often determines if students are trained in the mechanics of \est caKing 
so that test-wise skills do not have to be a discriminating factor 
'across students. Unless teachers are instructed to prepare students, it 
may not happen. Therefore, classroom test scores may be a function of 
test administration training, making score interpretation more 
difficult. 

Environmental Factors 

Although extensive research has not been done on the influence of 
various- settings on group test performance, several investigations sho.w 
the environment" to be a potential determinant of test scores. Three 
studies have found that when using separate answer sheets, students 
sitting in chairs at tables received higher scores than students sitting 
in chairs with a small attached writing surface (Kelley, 1943; Traxler & 
Hiek.ert, '1942; Traxler, 1963). 

, The arrangement of the desks in a classroom may also indirectly 
impact test results as shown in a study of Fenton (1927).- When college 
students were seated closely and thus given the opportunity to cheat, 
63% of the students did cheat.-. In a related study, Axelrod, Hall and 
Tarns- (1972) found that when students sat in row formations, their study 

rates were higher than when they sat at tables. The use of row 

\ 

I 
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formations may also improve test performance if attentive behavior is 
encouraged. Environmental extremes (such as poor lighting, extreme 
heat, poor writing surfaces') may affect test scores. In a personal 
communication, Rechebei (1980) told> of Micronesian students taking tests 
while sitting crossed-legged on floor mats. Information provided by the 
scoring service (Loret, 1980) indicated , that some of the Micronesian 
scores were not valid because the pencil marks were made on tests 
supported by students' legs and the 'answers were too light to be scored 
by machine. 

The place the test is administered may have some bearing on student 
scores. Seizt, Abelson, Levine, and Zigler (1975) found that IQ scores 
from disadvantaged preschoolers were significantly higher when they were 
tested at home rather than at school or in an office. In a similar . 
study, Stoneman and Gibson (1978) found that deve.lopmentally disabled 
preschoolers got significantly more items correct when tested in a small 
testing room than when they were assessed in their own classroom. 

Teachers may not be able to choose the testing setting since the 
classroom is often the only available place. However, the recognition- 
that setting influences student performance may discourage the use of 
inappropriate places for testing (e.g., the cafetorium or the 
principal's office) and direct the examiner's attention to details of 
seating arrangements. 

The atmosphere of the working situation can lower anxiety and 
motivation performance. Millman and Pauk (1967) suggest that students 
may be less anxious when they are concentrating on a task. They 
recommend that teachers assist, the students by creating an environment 
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'conducive to concentration: quiet, separated desks, structured 
procedures. 

Summary 

Although the effect on student test scores of training test 
administrators has not been investigated directly, studies reviewed in 
this section suggest that testing conditions whiclY are under the 
examiner's control do influence test scores. Students' test scores were 
higher when: 

1. the students had low anxiety levels, 

2. the examiners were familiar to the students, 

3. a ''positive;' climate was maintained prior to and during 
testing, * ' 

4. the students were informed of the nature and purpose of the 

examination, 

5. some type of feedback was given after the examination, 

6. the directions and general test-talcing strategies were* 
understood by the students, and 

7. an appropriate setting was used. 

Since* these situations are established by >the test administrator, 
examiner behavior maybe a differentiating variable in test score 
comparison. n • * ' 

Conclusions .. 

There are -no empirical studies that show the degree to which 
untrained or trained test administrators, maintain standardized 
conditions or that- show the differential effect of examiner training on 
• test scores. However, each year more and more school districts 
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(especially larger cities with evaluation units) are becoming concerned 
with quality! control measures as they elect to supervise the testing by 
observing teachers give tests (Krueck, 1981). 

If the conditions are thought to lower scores, school districts may 
provide training for teachers in test administration prior to the annual 
district-wide examinations. However, there are no empirical data to 
show that training examiners will affect test scores, will encourage the 

implementation of standardized procedures ,\wi 11 /improve student test 

\ 

scores, or will change teacher behaviors. There is no basis for 
decision making on whether to provide training/, Hence, decisions about 
teacher training are made according to budget feasibility rather than a 
perceived need^. 

Due to the need for' properly administered, tests , it is recommended 
that research on the effect of training test administrators should be 
conducted for three purposes: 

1. to determine if* training influences the implementation of 
standardized procedures, 

2. to document the effect of training on test scores, and 

3. to eliminate differences in trainers as a contaminating 
variable in test score comparison. 

To investigate the effect of proper standardized test conditions on 
test scores, a true experimental study with classrooms of students 
randomly assigned to treatment and control groups is needed. Several 
outcome measures should be used to determine the influence of training 
test administrators: student test scores, teacher behaviors during 
testing, and student behaviors during testing. 
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Summa ry 

Previous research has produced strong evidence that student test 
scores can be increased as a result of reinforcement procedures, student 
practice and training in test-wiseness , and manipulating various test 
administration techniques. Al though research on the effect of system- 
atic training of test administrators was not located, findings from 
studies that investigated the impact of various test administration 
techniques indicate that changes in variables that are under the 
examiner's control have a substantial effect on student scores. 

,Such variations in test scores have serious implications for student 
selection, program-comparison, student diagnosis, and funding.. One of 
the most serious consequences is that the wrong students may be identi- 
fied because test scores may result in part from motivation, test-wise- 
ness,or test administration, rather than knowledge. However, the 
evidence from previous research is not conclusive. Some of the previous 
studies have major methodological problems which raise questions about 
the generalizability to other students: (a) examiner bias , (b) small 
number of subjects, (c) no control group, (d) unspecified treatment, and 
(e) non-random assignment. In addition, previous research has not suffi- 
ciently investigated the effects of reinforcement and training in test- 
wiseness on group achievement test performance of primary aged children,' 
or the effect of training test administrators in standardized test 
administration procedures. 

As noted earlier, the contents of this review establish the theoret- 
ical foundation upon which the procedures and materials for this project 
were based. The following section describes those materials and 



procedures in detail. As will be noted, the rationale for what to 
include in much of the training materials for the experimental groups 
was based on the findings of this review. 
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CHAPTER III 
PROCEDURES 

As described in more detail below, participating schools from each of 
three districts were randomly assigned to one of three groups. Participants 
in Experimental Group I received all of the project's training materials 
(i.e., teachers were trained in standardized test administration techniques, 
and students viewed the How To Take Tests filmstrips, completed the workbooks, 
took the practice tests, and participated in the reinforcement system). 
Participants in Experimental Group II viewed the How To Take Tests filmstrips 
and took the practice tests. Participants in^the control group were not 
exposed to any of the project-related materials. The following section 
contains descriptions of the various training materials which were used as a 
part of the experimental treatments in either groups I or II. The remainder 
of this chapter will describe the sample of participating schools and 
students, the procedures for implementing, monitoring, and assisting with the 
experimental treatment, and the instrumentation used to collect data about the 
effectiveness of the experimental treatments. 

Description of Materials 
Based on the review of literature reported in Chapter II and the 

c. 

results of the previous State Refinements contract, the following, four areas 
were identified that might adversely affect the validity of students' scores 
on standardized achievement tests. 

1) Differential levels of test-takinq skill s on the part of students. 

2) Students 1 lack of f ami 1 i arity with and consequent confusion from the 
question format used in the district's standardized test. 

3) Lack of motivation on the part of students to do their best on the 
standardized test . 

4) Inappropriate admini stration ' of the standardized test. 
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The materials described below were developed by the project to eliminate or 
substantially reduce the influence of these variables on students' 
standardized achievement test scores. 

Filmstrips and Workbooks: Teaching Students 
How to Take Tests 

As noted in the review of literature, previous research has demonstrated 
that training students in test-taking skills raises the students' scores on 
standardized tests. The fact that students 1 scores on a test of reading 
comprehension can be raised by training them in test-taking skills suggests 
that some factor besides reading ability is being measured by the test. Since 
students already possess test-taking skills to different degrees, a training 
program which will allow all students to master test-taking skills will 
increase the validity of the test for measuring reading comprehension. This 
increase in validity results from the fact that once all students have 
mastered test-taking skills, the skills are no longer differentially affecting 
or confounding scores , on the test. The student training materials used in 
this project consisted of nine instructional filmstrips, nine tape-recorded 
narrations, and accompanying student booklets. The development and content of 
these instructional materials are described below. 

Development of training objectives . In developing the training materials 
for teaching. test-taking skills, an analysis of the content, directions, and 
format of frequently used • standardized achievement tests served ; as the primary 
resource. To decide which standardized tests should be examined, information 
was considered from 1 the following sources: (a) which tests are used by Title 
I projects in Utah, (b) which tests are used by Title I projects nationally, 

(c) which tests have been formally adopted by districts and states, and 

(d) which tests were being used by the districts willing to participate in the 
project. 
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The number of Title I projects in the- state of Utah utilizing a 
particular standardized test is shown on Table 9. Tests used by districts 
participating in the project are noted with an n *". 

Table 9 

Use of Tests in Utah Title I Projects 



Number of Title I Test 
projects 

9 California Achievement Test 

9 Gates-McGinite 

8 Stanford Achievement Test* 

5 Iowa Test of Basic Skills* 

4 SRA 

3 . Woodcock Reading Test _ 

2 Comprehensive Tests of Basic Skills* 

2 Metropolitan Achievement Test* 



* indicates a test used by a district participating-, in the 
project 

The frequency use of a particular test by Utah Title I projects was 
somewhat different than the frequency of use by all Title I projects in the 
country. According to staff at the Northwest Regional Educational Laboratory 
(NWREL), national project utilization of tests occurs in the following orcder: 
CAT, SRA, MAT, Gat es-McGi ni te, SAT, ITBS (see Appendix B for letter). 

Staff at NWREL also reported frequencies indicating test adoptions for 
both district and states by region as reported by McGraw-Hi 1 1 . ( Note : This 
information should be interpreted cautiously since it was part of 
McGraw-Hill's promotional material.) Table 10 displays the district and state 
adoption totals by region (see Appendix B for a complete listing). 
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Table 10 

Number of Test Adoptions for Districts and States by Region 





CAT 


CTBS 


ITBS 


MAT 


SRA 


SAT 


Midcontinent region 
Di stricts 
States 


9 


1 

2 


9 
1 


2 


1 




Western 

Districts 
States 


11 

0 


,12 

o 

C 


1 


1 




1 

i 
i 


Southern 

Districts 
States 


10 
5 


10 
1 


3 




8 


3 


Eastern 

Districts 

States 


10 

. 2 


2 

■ 1 




1 

j__ 


5 

~ 1 




Total 

Districts 
States 


40 
9 


25 
6 


13 
1 


4 
1 


14 
1 


4 
1 



Using the preceding information, decisions were made about which tests 
to analyze in developing the student training materials for taking tests. 
Table 11 summarizes the rationale for the six tests included for analysis. 
Each of the tests listed in Table 11 was analyzed to identify (a) difficult 
vocabulary, (b) difficult phrases, (c) series of directions, (d) new symbols, 
and (e) examples of different response formats. An example of the data 
collection form used to analyze tests (this particular form was for the. 
reading comprehension subtest of the MAT) is included in Appendix B to 
illustrate the type of -information obtained. Similar analyses were completed 
for each test. In addition, as shown in Table' 12, each test was examined to 
determine which subtests were included in the total reading score, the number 
of items in each subtest', the minutes allowed for each subtest, the content of 
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Table 11 

Summary of Test Use for Project Tests 

Test Description of Utilization 

CAT - Most commonly used by Title I projects in Utah and 

nationally. 

- Commonly used in all regions. 

- Most often adopted by districts and states. 

CTBS - Not commonly used by Title I projects in Utah or 

nationally. -. 

- Adopted by many districts and states, especially in the 
West and South. 

- Used by Cache School District. 

ITBS - Used by 5 Title I projects in Utah but seldom used by 

Title I projects on a national level. 

- Adopted by districts and states primarily in the 
- Mid co n t i n e n t - r eg ion. " 

- Used by Nebo School District. 

MAT - Used by only 2 Title I projects in Utah, but third most 

often used nationally. 

- Adopted by few districts x and states. 

- Used by Logan School District. 

SAT - Commonly used by Title I projects in Utah, but not 

nationally. 

- Seldom adopted by districts or states. 
Used by Granite School District. 

SRA - Cormionly used by Title I projects nationally and by 4 — - 

projects in Utah. 

- Adopted primarily by Southern and Eastern districts. 

- Used by Alpine School District. . 

> O _ " S 

c 

each subtest, and the format for administering each subtest. The contents 
of the subtests making up the total "reading 

However, several subtests were unique only to one test (SAT, Reading: Part A;. 
ITBS, Sentences, Word Analysis; MAT, Word Knowledge; SRA, Listening 
Comprehension). 

Based on the analyses described above, test-taking skills to be taught 
during the student training were identified and phrased as objectives. The 
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Table 12 
Subtests 



Level T or S Category 2 Tests In "Total Number of R. iated 

Reading" Score Items Minutes Subtests 



CAT 


12 


T 






10 










S 


w 


Phonics Analys Is 


15 


25 








S 


w 


Structural Analysis 


11 


14 








s 


V 


Reading Vocabulary 


15 


13 








s 


PC 


Reading Comprehension 


20 


20 




CTBS 


C 


T 


V 


Reading Vocabulary 


33 


15 








S 


sc 


Reading Comp. --Sentences 


23 


20 








s 


PC 


Reading Comp. --Passages 


18 


21 




ITBS 


8 


s 


sc • 


Pictures % 


23 


12 








s 


sc 


Sentences 


16 


7 








s 


PC 


Stories 


28 


15 








s 


B.V 




30 


14 


Vocabulary 






s 


w 




57 


20 


Word Analysis 



MAT (71) P2 S . V Word Knowledge 

T W . Word Analysis .. .. 

- — ■ "S SC— Reading—Sentences 

S PC Reading--Stories 



SAT P2 


S 


B 


Reading Part A 


45 


20 






S 


PC 


Reading Part B 


48 


25 






T 






30 


10 






S 


W 


Word Study Skills 


35 


15 






T 


V 




37 


20 


Vocabulary 


SRA C 


T 


w 


Letters/Sovnds 


20 


15 






T 




Listening Comprehension 


20 


25 






T 


V 


Vocabulary 


25 


15 






S 


PC 


Comprehension 


24 


30 




^Teacher directed 


(T) or 


Student 


directed (S) 









2\/ - Vocabulary 
W - Word Analysis 

B - Both Vocabulary and Word Analysis 
C - Comprehension 



original list of objectives was too lonq. Given the limited amount of 
instructional time (approximately 270 minutes) available for the student- 
training, the original list of objectives was reduced to include only those 
skills which were needed most frequently across the six tests. The tests were 
ag^in analyzed, the most frequently occurring skill areas were identified, and 
objectives^ for nine 30-45 minute instructional lessons were finalized (see 
Table 13). ^Sk^ll areas making up the nine lessons included both general 
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Table 13 



Objectives for Student Training Filmstrips 



FILMSTRIP 1--INTRODUCTION TO FILMSTRIP SERIES 



1. Understand that it*1s important to listen carefully and try your best on 
tests. - 

2. Start working at "go" sound. 

3. Stop working at "stop" sound. 

4. Put finger on page or item number when directed to do so. 

5. Follow one-step directions in the booklet. 

6. Stop working when the stop signal is given before a task is finished. 

7. Work fast when told to do so. 

FILMSTRIP 2--MECHANICS OF TEST FORMAT 



Understand that test scores are used to determine what students need to 
learn. 

Mark only one, answer for each question. 
Use answer space, circle, and oval interchangeably. 

Mack-answer space.xortiec.t ly_.. 

Erase completely. 
Work a "sample" with the class. 
Follow four-step directions.- 

Work items in sequence whether items are arranged in rows or tolumns. 
FILMSTRIP 3--RULES FOR TAKING TESTS 

1. Raise their hands if they need a new pencil or if they need help from 
the teacher. 

2. Understand that the teacher may help with directions but may. not help 
figure out answers. . > 

3. Point to every word as they read the test item. 
4-. Stop working when they see a stop sign. 

5. Go on to the next page when they see a "go on" sign or if nothing is 

Sprinted. 
7. Go back and check their work. 

FILMSTRIP ,4— VOCABULARY I 

1. Tell what a vocabulary test is. 

2. Find a word that means the same as an underlined word. 

3. Tel.l if the right answer names the whole picture, names part of the 
picture, or tells about the picture. 

4. Tell why a "tricky" answer is wrong. 

5. Use clue words to find word meanings. 

6. Substitute printed clue words with answer choices. 

FILMSTRIP 5— VOCABULARY II 

1. Find the word that is opposite of an underlined word. 

2. Find a word that means the same as a definition given orally. 

3. Tell why tricky answers are wrong. 



1. 

2. 
3.- 
4. 
5. 
. 6. 
7. 
8. 
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Table 13 (continued) 



FILMSTRIP 6 — WORD ANALYSIS 

1. Find the letters that stand for the beginning or ending sound in a 
word. 

2. Find the letters that stand for the middle vowel sound in a word. 

3. Find the word with the same sound as a spoken word. 

4. Find the word with the , same sound as the underl ined* letters in a 
written word. 

FILMSTRIP 7 — TEST-TAKING STRATEGIES 

1. Select one answer for each item for three-item pictures. 

2. Check three-item pictures by seeing if all answers relate to each 
other. 

3. Find the best word to describe a picture. 

4. Discriminate between tricky-'wrong answers that are look-alikes and 
rel ati ves . 

5. Use the information in the picture to find the right answer and not be 
swayed by personal experiences. 

6. Eliminate obvious wrong answers and then guess. 

FILMSTRIP 8--SENTENCE COMPREHENSION 

1. Do sentence comprehension test items in three formats. 

a. Find sentence that tells about a picture. 

b. Find word that completes sentence so it tells about a picture. 

c. Find word to complete a sentence so that it makes sense. 

2. Tell why an answer choice does not make a true sentence. 

3. Try each answer choice in a sentence before marking the correct word. 

FILMSTRIP 9— PARAGRAPH COMPREHENSION 

1. Find the answers to literal comprehension questions. 

2. Find the answers to inferential comprehension questions. 

3. Find the answer that tells the main idea of a story. 

4. Find the best name for a story. 

5. Tell why distracting answer choices are wrong. 
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test-taking skills, as listed under Filmstrip 2--Mechanics of Test. Format and 
Filmstrip 3--Direction Following; and test-taking skills specific to subtests 
of the six standardized tests analyzed such as those reflected in the 
objectives for Filmstrips 3-9. < " v 

Skills which are general to al 1 standardized tests (such as marking an 
answer space, erasing, working a sample, stopping and checking work) are the 
first skills taught (Filmstrips 2 and 3 in Table 13). Skills specific to 
subtests were taught next in a sequence which moved from simple' to complex 
(e.g., simply finding the word that best tells about a picture, to finding the 
main idea of a paragraph). Prior to the instructional lessons on general and 
specific test-taking sk^ students were taught how to respond to the medium 
of instruction used in tne training package (see Table 13 for skills listed 
under Fi lmstrip-.l — Introduction to Filmstrip Series).- 

Rationale for filmstrips as the medium of instruction . Several alterna- 
tives were considered for delivering the content of the student training 
(e.g., classroom teacher lecture, staff presentation^ student workbooks). A 
^major concern with most approaches was that consistency across classrooms 
would be difficult to maintain. Instruction provided by elassroom teachers or 
project staff would probably vary in quality from classroom to classroom and 
threaten the internal validity of the study. Another concern was the amount 
~~of~~time required-for "teacher preparation. If teachers had been"' asked to 
prepare for nine 30-minute presentations, it would probably have required at 
least 270 minutes of preparation time per teacher (30 minutes for each 
lesson). By using filmstrips as the medium for implementing instruction, we. 
could be more confident that the entire treatment was being implemented and 
\that the quality of instruction was consistent in each of the 40 classrooms. 
In addition, teacher preparation time was reduced to 90 minutes per teacher 
(10 qinutes for each filmstrip). J 
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The filmstrips were developed to be shown on a classroom chalk board. 
The characters and pictures are line drawings which appear in a chalk color on 
the board. Classroom students were surprised and intrigued by the realism of 
the f ilmstr ips--almost as if large characters drawn on the chalkboard had come 
to life. The fact that the filmstrips were so different from anything 
students had seen before helped to keep their attention, and the simpl ici ty of 
the line drawings and chalk color helped to maintain the students' attention k 
on the' instructional content. 

Instructional philosophy . The material in the nine filmstrips is taught 

v 

using a ;, direct instructional" format. : That is, specific skills are modeled, 
then the students are guided through practice and are tested on their 
competence. The direct instructional sequence is used (a) to clearly 
establish the intent of the instruction, (b) to reduce incorrect responses, 
and (c) to provide students frequent opportunities to practice and to provide 
the teacher with frequent opportunities to determine how well the students are 
progressing. 

The five types of instructional objectives in the direct instructional 
method are listed' and defined below: 



Objectives 

1 . Teaching Objective * 

The^ students are told the specific 
task to be learned. 

2 . Modeling Objective 

The correct way to complete a task 
is demonstrated. Non-examples of 
0 the task may also be shown. 

3. Leading Objective 

The students respond with the film- 
strip characters or the teacher. 



Key Words 



"You will learn . 



"This is the right way. 
"This is not . . . 



"Say it with me. . . ." 
"Do it with your teacher. . 
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4. Testing Objective 

The students respond alone. "When I say qo, you . . . ." 

5. Correcting Objective ? 

The filmstrip or the teacher show "Your answer should look like 

the correct response. like this. ..." 

j 

An example of how the five-step instructional sequence is'' used in the first 
filmstrip is shown on. the next page. ; v; m ; 

Use of story line and characters . After selecting filmstrips as the 
instructional medium for training students in test-taking, the next, step was 
the development of . a story line and characters. Several exciting scripts with 
amusing and involved plots were written and piloted with individual children. 
During the pilot testing of these scripts, staff noticed that the complexity 
and interest of the story line was interfering with students 1 abi'lity to 
attend to the instruction. s Consequently, .we decided that , the story line must 
be kept simple—enough to be of interest but not so interesting that- it would 
interfere with instruction. 

Familiar animals with typical distinguishing characteristics and 
predictable personalities were chosen as the main characters (e.g., the wise*' 
owl, the smart and crafty fox, the slow and lovable gorilla, and the shy 
raccoon). The characters encourage students by stressing, the ^importance of 
learning to be good test-takers. In Filmstrip #1, Professor Owl tells his 
animal class, "Did you know that there are magic tricks to taking tests .that 
everybody can learn? ~ Yes, indeed." They also offer 'hel pful "hints" or 
learning strategies. For example, in Filmstrip 6 students are told by 
Professor Owl, "Here is a hint. You first say the, word and the sound. Let's 
pretend the word is cat . You say the word and, the sound like this. Cat— k." 
Throughout the filmstrip series, the characters offer timely prompts (e.g., 
erase completely, be sure to check your work, don't try to find the same 

letters) . - 

\ 
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EXAMPLE OF DIRECT INSTRUCTION SEQUENCE 



Instructional 
Sequence 

1EAC.H: 



Video 




S3 



E3 



Narration 



Now, it's time to "learn a new 

word. "We will" Team it on the 

next page, but you must listen 
very careful ly. 



69 

JSB. 



11|> 



Point to paqe number three. Your 
finger should be pointinq to the 
number three . . . 



m 
n 

m 




. . .at the bottom of paqe 
three, like this. Listen, here 
comes the new word. 



3 



MODEL: 



LEAD: 



TEST: 



fia 



m 



•a 
•a 
•a 



■a 
■a 




These three numbers are called 



. . . "item numbers." What are 
the numbers called? Item 
numbers. 



Good. Now, point to item number * 
one. You should be pointing to 
item number one . . - 



CORRECT: 




like this 



TEST: 




Now point to item number three. 
You should be pointinq to . . . 



FRir 



CORRECT: 




. . . item number three, like 
this. . Good. 
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The characters also reinforce students for trying and for learning. In 
Filmstrip #6, Owl says, "You did just fine, Racky. And so did everybody else. 
I'm proud of you. H And in Fi lmstrip#2, Owl remarks, "You students were 
really, really good," and the rest'of the characters respond, "Yeh, they are 
fantastic;" 

"Throughout the nine filmstrips, the characters demonstrate some of the 

anxieties which students may be feeling about test-taking. For example, in 

F.ilmstrip #1, Owl announces that the animals are going to study a very 

interesting and important subject--how to take tests. The characters respond 

as follows: 

Everyone: [Gasp] TestsI 

Owl: Of course! Don't you like tests? 

Gorilla: Not me! 

Bunny: Me neither. 

Mice: Neither do we! 

Foxey: Well, I do! 

Everyone: Booooooo!!!! 
Owl : 



Racky: 



Now, now, students. Just a moment, please! 
Being able to take tests is very important 
to learn! 

But taking tests always scares me. I mean, 
I just get t-e-r-r-i-f-i-e-d ! ! ! 



Gaffy: S-s-so do I. 
Gorilla: Even I get scared. 
Foxey: Well, not me! 

Owl put all the characters (and hopefully any students also anxious about 

test-taking) at ease by telling them that taking tests is easy for Foxey 

because he knows the secret. Owl explains that there are magic tricks to 

taking tests that everybody can learn. 

Characters also point out misconceptions and model correct and incorrect 

test-taking strategies. In Filmstrip #5, Simon the Snake tries to trick 

students into selecting the wrong option for the following item: 
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"Huge" means 

0 laugh 
0 hug 
0 large 

0 small . 
Simon makes these comments: 

- •Laugh 1 isss a good choicccce. 'Laugh 1 looksss a lot like 'large,' 'hug' 
would be a sssensssational anssswer because 'hug' looksss lot'sss like 
'hug, 1 and 'small 1 could be a good anssswer because 'sssmall' is the 
opposite of 1 huge. 1 

Each of these false lines of thinking is corrected by Owl and classroom 



Characters also take turns modeling correct implementation of test-taking 



students . 



skills. In Filmstrip #3, Jack models "checking your work" by completing 



several test items and verbalizing his thoughts: 



OK, let's see. First, I look at the picture. Then 
I point to each choice as I read it. Head, hat, wear. 

Hat is the correct choice, so I mark hat. Ho huh! 




Then, I read the second item. Tree, fall, leaf. 



There! Ho huh! That's better. The teacher says 
"stop," . . . 



Then I mark- f al 1 . 



Now, I've come to a stop sign. The teacher hasn't 
said "stop" yet, but I am finished with my items, 
ho huh. So, I will go back and check my work. 



. . .so I don't have time to do anymore. I just 
put down my pencil and wait to hear what I am 
supposed to do next. 



Hat. Yup, I still think this is the best answer so 
I'll leave it the way it is. 



Fall:' 0oops,> that's not the best answer. 



I'd better change it now. 
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In Filmstrip #3 Foxey, who is usually right and always stuffy, provides 
an example of an incorrect test-taking strategy- A test-taking rule (point to 
every word in a test item as you read it and think about it before you pick 
your answer choice) has just been given by Owl, The following sequence then 
takes place: 

Here is a test item. Foxey, show us how to follow 
rule number three for this test item. Point to 
each word as you read it before you tell us the 
best answer. 



Oh, don f t be so stuffy, Professor. I don't have to 
point and read every word. I can tell with just a 
quick glance that the answer is "food, 11 

Now, Foxey, don't answer too quickly. The rule is, 
"Point to every word in the test item as you read 
it and think about it before you pick your answer ." 

r. 

Ho hum. What a bore! Ok, food, . . . 
dog, ... 

Oh, my, apple! Why, that.js_ a better answer. 

See there, Foxey? That is why it is important to 
point to every word as you read it. 
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Professor Owl is the most prominent character in each of the filmstrips . . . 
allays wise, honest, straightforward, and the primary teacher. Each of the 
nine filmstrips* has a simple central theme. The characters interact with each 
other just enough to add interest and develop the outlined themes. One or two 
chiracters are prominent in each filmstrip, with new characters such as 

I- \ 

Detective Nancy True and Erp occasionally emerging. 

Teacher/filmstrip interaction . The filmstrips are constructed so that the 

I 

teacher must interact with the filmstrip characters and with the classroom 

1 1 ' 

students. 1 Several different teacher response modes are included. For example, 

j 1 . . 

Professor lowl asks the teachers to answer short questions (e.g., "Teacher, how 



did 



your students do?"), explain or review concepts (e.g., "Teacher, could your 
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students explain this so Racky will understand that trying hard is 



important?"), demonstrate skills (e.g., 



"Teacher, would you demonstrate how 



this page should look?"), and check the students 1 work and report back (e.g., 
"Teacher, would you check with -your-students -to see if they answered the items 
correctly?). 

The rationale for involving the classroom teacher was to improve the 
quality and flexibility of instruction. The teacher performs many tasks 
throughout the nine filmstrips which otherwise would be difficult, if not 
impossible. For example, the classroom teacher: 

1. Reviews important objectives of test-taking. 

2. Demonstrates continuous hand movements that are difficult to convey 
in still picture frames (i.e., the correct way to quickly fill in an 
answer space) . 

3. Monitors student responses, reinforcing correct responding and 
stopping the filmstrip to correct errors. 

4. Provides a prompt (hand signal) for students, cuing them when to 
respond , 

5. Demonstrates complicated procedures that require several steps. 
.6. Leads and corrects practice exercises in student booklets that 



teacher also provides- diversity and maintains student interest. As classroom 
teachers became actively involved in the student training, the students seemed 
to sense the importance of the material. The teachers provided excellent 
models, and the students strove to please the fMmstrip characters and the 
cl assroom" teacher. Each time a teacher response is required, Professor Owl 
addresses the teacher directly (e.g., "Teacher, would you and your class like 




ie_interaction between the filmstrip characters and the , cl assroom 



reinforce skills taught in filmstrip. 



eric 
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to join us to learn t.iese magic tricks about test-taking?"). At the end of 
the question or request, there is a signal to the teacher and a brief pause 
(or blank spot) on the tape (approximately 2-3 seconds) which allows enough, 
time for the teacher to give a short response. - Rather than trying ■to estimate 
how long the teacher's response would take each time and pausing the tape 
accordingly, all pauses are a standard length. If the teacher wants to do 
more than can be done in the 2-3 second pause, he/she can turn off the taoe 
and take as much time as needed. In this way, the teacher retains complete 
control of the instruction and can adjust the pace and emphasis to suit the 
needs of individual students. Teachers were aiso encouraqed to circulate 
about the room during the filmstrip to check students 1 work and reinforce good 
behavior. 

Cue cards are also provided with each of the nine filmstrips. These 
cards illustrate main points from the filmstrips. The purpose of these cards 
is to provide a technique by which the classroom teacher can easily review 
these main ideas. Prior to showing a.f i lmstrip, the teacher's guide directs 
the classroom teacher to review main points, using the cue cards provided. 

Student response mode . Throughout the filmstrip, students are asked to 
respond as a group either verbally, physically, or in writing. Filmstrip #1 
(Introduction to Filmstrip Series) teaches the skills students need to 
appropriately interact with the filmstrip. Group response is used to keep all 
students actively involved in the learning process as well as to provide 
feedback to the teacher on the level of student skill acquisition. By 
involving the students in group response activities, the teacher can quickly 
survey the class to determine who is following the lesson and who needs ■ 
special attention. 

When a student response is requested, a question is asked by Professor 
Owl' as he looks at the classroom. Oral responses are followed by a correction 
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statement (i.e., "Yes, you can erase, but do not erase too often"). Classroom 
teachers were encouraged .to elicit a response from every student because a few 
non-responders (who may not need to answer to learn) model inappropriate 
behavior for those who do need to respond. It was suggested to the classroom 
teachers to provide a quick drop of the hand or snap of the fingers as a 
signal to the students to respond.* If all members of the class are 
responding, an active, exciting 'learning environment is qenerated, attention 
is kept to the task at hand, and off-task behavior is not a problem. As a 
rule of thumb, students were given at least two examples as part of an 
instructional sequence which provide students verbal practice before requiring 
any written responding. 

The most common physical response is pointing to a page or item number. 
Here Professor Owl tells the students exactly where to point, and students 
were prompted to follow these directions explicitly for two reasons. First/ 
pointinq to things is an important test-taking skill for young students; it 
helps them keep their place and forces them to read every word. Second, if 
students are pointing, a teacher can quickly scan the desks of every child at 
any time and see if everyone is on the right page or item. 

When a written response is required from students, Professor Owl orally 
signals a "go" and "stop," All written tasks are performed in a student 
booklet within a time limit to give the students practice in concentrating on 
their task and working as quickly as possible. Also, the time limit keeps the 
students movinq as a unit which is a requirement for grofip testing. 

Individual student work booklets accompany each of the nine filmstrips. 
These booklets provide short exercises so that, students can practice the 
skills presented in the filmstrip. The booklet exercises are short, with 
either the filmstrip characters or the classroom teacher leading the 

> 
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instruction. The booklets allow the students an opportunity to practice and 
correct newly learned concepts before proceeding to learn additional, 
concepts. 

The format of the student booklet items is representative of the various 
formats used in the six tests analyzed. For example, the format of items one 
and two in booklet #6 is as follows: 

1. g j f 

ooo* 

2. ch k f 
0 0 0 

The answer spaces are arranged horizontally rather than vertically arvi the 

letters are above the answer spaces rather than below. This item format is 

representative of formats commonly used in the six tests analyzed. The 

. content of the items is a natural continuation of the instructional examples 

used for modeling and leading within the filmstrip. Some of the booklet items 

are completed with the Owl or other characters, and some of the items are 

completed independently by the students. 

To facilitate the development of class group response, the teacher is 
also encouraged to jemploy group response techniques when doing other 
activities related to the filmstrip (e.g., reviewing previous lessons, 
reteaching confusing concepts, askinq questions, warming up the class before 
showinq filmstrip). A hand drop or finger click is a useful cue to students 
that an oral response is requested. 

Field tests and pilots . Instructional sequences for each filmstrip 
were field tested with individuals and small groups of students before the 
story line was added. One staff member acted as the teacher and used cue 

Si 

cards as a visual stimulus to walk one to three students through the entire 
instructional sequence (including the student workbook). This filmstrip 
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simulation was observed by. other staff members who noted instructional or 
procedural errors, such as the omission of proper verbal signals (e.g., "When 
I say go, mark your answer space") or a simple rewording which added clarity. 
If the students were unable to respond appropriately, one of the following 
changes was usually needed: (a) more modeling or leading, (b) a helpful 
"hint", (c A a prompt, (d) a visual stimulus (underlining of key words or 
character pointing to key words), (e) addition of a simpler lead-in task, or 
(f) a re-evaluation of objectives. Following this pilot, corrections were 
made and another pilot was conducted before the story line was added. For 
some filmstrips, the cycle of pilot-revise-pilot was repeated several times 
before adding the story line. 

After the story line was added, the finalized script was put into story 
board form and photographed. Slides were then produced and sequenced in trays 
to pilot test before a filmstrip was produced. Pilot tests of the slides were 
conducted in one or more of the four pilot classrooms of Logan School 
District. One staff member served as the classroom teacher, one operated the 
slide projector, and several observed, taking notes. Following tto pilot test 
with slides, staff members discussed their notes and decided on specific 
changes to be made. Because of the extensive field testing prior to pilot 
testing with the slides, most of the corrections which needed to be made with 
the slides at this point were minor (e,g., enlarge the print, eliminate red 
highlighting, add more character prompting). Necessary corrections were then 
made (i.e., new slides), and a second pilot was carried out if changes were 
substantial. The filmstrip w'iS then produced, the accompanying tape was 
finalized, and duplicate copies were made. 

Sequence of making a typical filmstrip and tape . There were many steps 
in making a typical filmstrip, with the tasks' moving from one staff member to 
another and, in some instances, small groups working together. 
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To summarize the activities involved in producing the student training 
materials, the. steps in making a typical filmstrip and tape are outlined in 
sequential order below. 

1. Write the instructional sequence based on the objectives for that 
f i lmstrip. 

2. Develop student booklet along with the instructional sequence. 

3. Field test the instructional" sequence with one to three students. 

4. Revise the instructional sequence. 

5. Repeat steps 3 and 4 as necessary. 

6. Write the story line. 

7. Do artwork and photograph slides. 

8. Produce a tape. 

9. Pilot test the slides and tapes. 

10. Revise script and retape. 

11. Correct slides as necessary. 

12. Repeat steps 9 to 11 as necessary. 

13. Produce the filmstrip. 

14. Redo the tape incorporating the corrections. 

15. Make duplicate copies of the filmstrip and the tape. 

The time required to complete a filmstrip varied greatly. As might be ' 
expected, the first filmstrips and tapes required more time to make because of 
the unf amil iarity of tasks required. With later filmstrips, the time required 
to produce a filmstrip; arid tape decreased. Table 14 shows the approximate 
timelines for making the filmstrips and tapes. 



Table' 14 

Time Line for Making Filmstrips and Tapes 



2 
3 
4 
5 
6 
7 
8 
9 



-I Imstrlps SEP 2 3 4 OCT 2 3 4 NOV 2 3 4 DEC 2 3 4 JAK 2 3 4 FEB 2 3 4 MAR ? 3 4 APR 2 3 4 
1 
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Practice Tests 

Past research has shown that the following conditions are associated with 
increased student scores on standardized tests: (a) administering practice 
tests prior to the actual test, (b) using nractice forms that closely resemble 
actual test forms, (c) giving feedback to students on their'test performance, 
(d) training students to work independently f or up ^to 30 minutes, (e) giving 
students timed tests in reading and math prior to the actual test, and (f) 
familiarizing students with the directions. As a result of these research 
findings, the use of student practice tests was incorporated as an integral 
part of the present 'project . 

Students in both the 'Experimental I and Experimental II groups were 
provided with practice in Jfc^k-trfq standardized tests throughout the school 
year. Members of the project staff constructed the practice tests for 
teachers to administer in' their own classroom. The practice tests were 
designed to familiarize students with the procedures and formats of the 
standardized test used in their district. Additionally, the administration of 
practice tests provided students an opportunity to apply to a testinq 
'situation those test-taking skills taught in the filmstrips. The following 
sections will describe the rationale and procedures for the development of the 
practice tests. . 

Frequency . Originally, 12 practice tests were planned for administration 
to students in Experimental^ Groups I and II at an approximate rate of one test 
every two weeks. However, the construction of the practice tests became a 
much more complex task than had been anticipated, and the final number of 
pr actice^tests produced was 7. A time line showing the production dates for 
the- practice tests is included in Figure 3. Teachers administered the 
practice tests .approximately every three weeks, from October through March. 
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Practice Month 
Test . AUG SEP OCT NOV DEC JAN FEB MAR 

« 1 

2 ■ 

3 _: . 

4 . 

5 " 

• 6 ' 

7 ' 

Figure 3. The time period for producing each practice test. 

Practice tests were constructed to increase in length of time required 
for administration from 5 minutes (Test #1) to 30 minutes (Test #7). The 
gradual 'increase in time assisted the student in learning to work 
independently for the average number of minutes required to take one subtest 
on the actual test. The number of minutes and items for each practice test is 
displayed by school district in Appendix C. The mean number of minutes and , 
items (across the four experimental districts) usedjfor each practice test is 
shown in Table 15. ' 

Table 15 K • * 

The Mean Number of Items and Minutes Used 
for Each Practice Test 



Practice Test Mean Items ^ Mean Minutes. 

1 ■■ 11.3 5.0 

2 21.5 10.2 

3 28.7 13. '6 

4 30.7 13.9 

5 41.0 20.2 

6 56.7 28.9 

7 56.7 28.9 
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Format . Four different practice test series were developed. Each series 
was constructed to resemble the reading subtests (vocabulary, word analysis, 
and comprehension) used by the-four districts participating in the study: 
Logan District (the pUot Schools), the Metropolitan Achievement Test (MAT); 
Cache District, Comprehensive Tests of Basic Skills (CTBS); Granite District, 
the Stanford Achievement Test (SAT); and Nebo District, Iowa Tests of Basic 
. Skills (ITBS). The other portions of the tests, such* as science, math,- social 
science,- and language were not included in the piractice test. 

A copy of each of the standardized tests was obtained and the reading 
portion of "the test was analyzed to determine how many items should be 
included in a 30-minute practice test. For instance, if the actual reading 

V 

subtests required 90 minutes, only one-third the number of items (30/90) would 
be used for a 30-minute practice test. A chart showing this computation for 
practice test #7- (30 minutes) is located in Table 16. A proportional number 
of items was computed for the time limitation (5-30 minutes V'Ybr each of the 
seven practice tests. Each subtest within a practice test also contained a 
proportional numb.er of items^^of those found in the actual test. Thus;- if the 
actual standardized test was 56 minutes (see CTBS, 1973) and the vocabulary 
subtest was 15 minutes, a ratio of 15:55 would be maintained in the vocabulary 
subtest of the practice test. That *is, vocabulary would be\8.1 minutes (15/56 
X 30) and have 18 items (8.1 X 22). In this manner, each standardized test 
(MAT, SAT, CTBS, and ITBS) was examined and the appropriate number of items 
was computed for each practice subtest. (Copies of all practice tests are 
included in th'e Teacher 1 s Manual . ) 

Another strategy employed in constructing the practice test format was to 

introduce only one or two reading subtests in each of the first several 

k< * ■ 
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J Table 16 

Computations Used to Determine Number of Items to 
Include in a 30-Minute Practice Test 















Proportion 


a 






• 




Items/ 


% Total of 30 


Number 


TEST 


SUBTEST 


Items 


Time 


Minute 


Time 


Minutes 


of Items 


U 1 DJ 


Vnra ht I 1 an/ 
VULaUU 1 Ql Jr 




i R 


? ? 


.27 


8.1 


10 


1Q7? 


Ronton roc 
JC 1 1 Lei 1 Ceo 


2? 




1 1 5 


.35 


10.5 


12 




Paragraphs 




21 


.86 


.38 


1 1 4 


10 


PTES 


WnrH AttarLr 


40 


38 


• 1 05 


.45 




14 


IQftl 


i UtQUU laijr 


25 


1 Q 

1 -7 


1 ^2 


.22 


u • u 


Q 

-7 




frimnrphpn^i nn 

OU Hip 1 C 1 1 C ( 1 o 1 UN 


25 


c o 




.33 


9.9 


Q 


ITBS 


Vocabulary- A 


17 


' 8 , 


2.13 


12 


3 fi 


8 " 




Vocabulary- B 


13 


6 


• 2.17 


nq 


2 7 


6 




UlnrH Ana 1 vc i c 


: 57 


2D 


? 85 


29 


ft' 7' 


25 




r 1 L. LUi cb 


2*3 


1 2 


1 Q2 


J 1ft 




1 n 




Con tonroc 


1 

10 


• 7 

• / 


2 2Q 

C. . LVD 


.10 V 


3:0 


7 ■ 




jLUi Icb 


2ft "\ 




1 ft7 


.22 


6.6 


V2 


MAT 


.Word Knowledge A, 


1? . 


6 


£.83 


.10 


3.0 


9 ' 




Word Knowledge B 


23 


12 


1.92 


.18 


5.4 ■ 


10 




Word Analysis 


35 


15 


2.33 


.24 


7.2 


17 . 




Reading A 


13 


7 


1.86 


.11 


3.3 


6 




Reading B 


31 


23 


. 1.35 


,37 


11.1 


15 


SAT 


Vocabulary 


37 


20 


1.85 • 


.22 


6.6 


12© . 




Reading A 


45 


20 


2.25 


.22 


6.6 


15 




Reading B 


48 


25 


.•1.92 


.,28 


8.4 


16 




Word Study A 


30 


10 


• 3.00 


.11 


3.3 


10 




Word Study B 


35 


15 . 


2.33 


.17 


5.1 


12 



a This number .is the computed number of items for practice test #7. 
However, this number may be different from the number of it .ms used in 
practice test #7 due to adjustments for standardized test formats (e.g., some 
subtests require items to be in groups of three)/' \ 



practice tests' until all subtests were included. For example, SAT Practice 
Test 1 was 4. minutes long and included only 7 Vocabulary, items. In SAT 
Practice Test 2, Vocabulary was repeated (with new words) and a second 
subtest, Reading-Part A, was added. In Practice Test 3, Vocabulary was 
dropped and Reading-Part A, Reading-Part B, and Word Study. Skills-Part A were 
included. Thus, one or two new subtests : were progressively added to each 
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practice test until all of the subtests from the actual reading test had been 
included. All of the reading subtests used in that particular standardized 
test were included in the last several practice tests. 

Content . To generate the content for the practice test items, the actual 
reading series used in the four school districts were identified and texts 
obtained. Thus, the vocabul ary words , comprehension skills, phonic sounds, 
and word attack skills in the practice test were those that students had 
actually studied in class. A complete list of the reading series used in the 
study is found in Appendix C. 

Initi al ly 5/ t^achers periodically informed the project staff about the 
pages they would be covering in their classes during upcoming weeks. Practice 
test items were then constructed using content from the reading series unit 
being taught at the time the practice test would be administered. - For 
example, if the classroom was studying Unit 4 at the time the second practice 
test was 'administered, then the items drawn for practice/ test #2 would be from 
Unit 4. ,.. * 

The original plan was to construct three different practice tests for 

/ 

each classroom based on high, medium, and low reading levels found in most 
classes. Theoretically, it was possible that 120 different tests would be 
constructed for each of the seven practice tests^because 40 teachers using 
three levels of different curriculum were participating in the study. 

After identifying the content to be tested, items, correct answers, and 
distractors (wrong answers) were generated. /To formulate distr'actors -similar 
to those used in th& actual standardized tests, the ITBS, CTBS, MAT, and SAT 
were closely examined and a list of the type of distractors used in the tests 
was constructed. These strategies are listed in Appendix C. For example, 
some of the construction strategies used in the standardized tests were words 

j 
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with similar final and initial sounds, words that were similar in appearance, 
and words with similar definitions or spellings. 

After the project was started, it became clear that the production of 120 
practice tests every two weeks (with anywhere from 10 to 70 items) was 
unrealistic. Based on the pilot testing of the first practice tests and 
considering the amount of time needed to generate practice tests and obtain 
feedback from the teachers on the pages covered in their reading tests, a more 
realistic procedure was developed. 

Although the textbooks varied across teachers, the basic vocabulary, word 
attack, and comprehension skills were similar within reading level: high, 
medium, or low. Therefore, a generic list of vocabulary words and reading 
skills was generated for each practice test by surveying the texts within a 
reading level. The content for the four practice test formats was then drawn 
from the appropriate list and transformed into test items. This method 
resulted in students at similar reading levels receiving the same practice 
test content across districts but with a practice test format unique to their 
district. * 

Directions for practice test and scoring . Directions accompanying each 
standardized test were modified to fit the practice tests. Separate 
instructions were prepared for each subtest as the test items were prepared. 
(Complete copies of 'the directions for all practice tests are contained in the 
Teacher 's Manual . ) Different directions were written for students in 
Experimental Groups I and II. An example of the directions for one of the SAT 
practice tests (#5) for Experimental Group I is contained in Appendix C. Note 
that only one set of directions was necessary even though three levels of the 
practice test were administered in any given classroom at the, same time. This 
could be done because even though most of the content for the three levels was 
different, sample items and any items in which the correct answer or stimulus 
was read verbally by the teacher (e.g., "Mark the word 'dog'") were the, same. 
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. Pilot testing . After the test items, distractors, instructions, and 
scoring keys were generated, the test in rough form was reviewed by project 
team members to detect major errors and inappropriate items. After necessary 
changes were made, blank formats and the draft test were sent to a graphics 
artist who drew the necessary pictures. Next, a typist inserted all the item 
content. Following the completion of the artwork and typing, the practice 
tests were reviewed again by staff for errors before the pilot test. Each 
level of each practice test was piloted with a small group of second grade 
children (two to three students per level of the test). The piloting was 
conducted to discover any typing errors, missing numbers or letters, and 
incorrect answer keys; to clarify instructions that were not easily 
understood; and to note misleading and ambiguous test items. Final 
adjustments were made, then the practice tests were mass produced and mailed 

A 

to the teachers participating in the study. 

Reinforcement Procedures 

The Utah 79-80 State Refinements Project demonstrated that motivated 
students scored better on standardized achievement tests than students who 

were not motivated. However, this improvement in achievement test scores was 

\ 

attained by paying students money if they scored better than was\predicted 
based on their pretest score. Clearly, it would not be practical to continue 
to pay. students for trying hard on a standardized achievement test.. For 
example, students would likely figure out that all they had to do was score 
poorly on the pretest to collect more money on the posttest. In addition, 
paying students based on their performance on a standardized achievement test 
would violate the norming procedures for the test. One of the goals of this 
project was to develop and evaluate a more practical alternative for 
motivating students to do their best on andardized achievement tests. 
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It was decided that violations of the norming procedures could be avoided by 
designinq a motivational program to follow the biweekly practice tests. If 
students learned the habit of "trying their hardest 11 on the practice tests, 
hopefully, the habit would transfer and increase the students 1 motivation to 
try their hardest on the actual achievement test. 

Rationale . To be effective, it was decided that the procedure developed 
should meet the following criteria. 

1. Focus on effort, not aptitude. . 

2. Be motivating for the majority of students. 

3. Remain motivating for the duration of the project (6 months). 

4. Be minimally disruptive to the class that is using it and to the 
other classes in the school. 

5. Require minimal time expenditure by teacher and students. 

6. Require minimal monetary costs. 

The use of tangible reinforcers such as a token economy did not meet 
several of the criteria listed above. For example, previous experience with 
token economies by the project staff indicated that although they are often 
initially effective, over lonq periods of time (as was the case with this 
project), token economies often lose their appeal and* become difficult to 
maintain. Also, token economies are* more of an exchange of goods or a payoff 
for performing well on a test instead of the desired intrinsic motivation to 
perform well on tests. . 

The strategy that best met the criteria stated above and was therefore 
selected for the student reinforcement component was a self-charting of 
^improvement procedure. Sel f -charting of improvement refers to a procedure 
where students earn points which can be charted on a display (either public or 

private) for; each increment of improvement on the targeted task. The 

/' . . 

i 
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effectiveness of this kind of a reward system to motivate students to perform 
academically has been demonstrated repeatedly (Paquin, 1978; Van Houten & 
Parsons, 1975; Willis, 1974). The self-charting materials and procedures are 
described below. 

Description of procedure . Each Experimental Group I student received a 
personal chart mounted on a brightly colored poster board in a color selected 
by the student. The chart consisted of 7 horizontal bars, each bar 
representing 1 practice test. Each bar was divided into 50 segments, each 
segment representing one point (see Appendix D for a sample chart). Ample 
blank space remained on the chart and posterboard for the students to decorate 
the charts with their names and other creative artwork. The bar graph chart 
and the blank space for decorations allowed the students to personalize their 
charts freely in an attempt to make the charting process as individualized and 
reinforcing as possible. Each Experimental Group I classroom also received a 
3X4 plywood display board equipped with 30 hooks on which the students' 
charts. could hang. The teachers located the display boards in a prominent 
place in the room. 

After the students scored their practice tests, they were to calculate 
the number of points their score exceeded an individually established 
criterion marked on their tests. This criterion was referred to as the 
student's "To Beat" score. Each point that equaled or exceeded the "To Beat" 
score was considered a "bonus • point" . The students were to graph the bonus 
points on their charts by marking the appropriate number of segments in the 
bar for that practice test. Approximately five minutes was given to graph and 
decorate the chart w(ith crayons and colored pens (see Appendix D for a 
decorated chart). The charts were then returned to the display board for all 
to see. 
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Bonus points were cumulative. The points earned on each practice test 
were added to the points earned on subsequent tests for a new grand total. In 
this way, the students always had something to graph and decorate. When bonus 
points were earned, the students graphed the cumulative total (previous total 
plus bonus points). When no bonus points were earned, the previous total 
(plus 0) was entered on the chart. By allowing the bars to be graphed 
cumulatively, the charts always stayed the same height or grew taller. A 
decrease in points from one test to the next was never registered. 

Additionally, because each child was given an individually established 
criterion to beat, the higher achieving students were not any more able to 
earn bonus points than the less able students. Thus, the reward system was 
set up so that students competed against themselves and other students to see 
how tall they could get their graphs to grow. 

Project staff were responsible for determining the reinforcement 
■criterion for each student. These M To Beat" scores were marked on each 
student's test before they were mailed to the teachers. Providing the student 
with the score that had to be beaten before taking the practice test was an 
attempt to increase the student's incentive to-improve. 

To determine the individual criteria for the first practice test, each 
teacher divided their classes into quartiles based on the information 
available at the beginning of the school year. Depending 'Upon which quartiles 
they were in, the students were reinforced for scoring at or above the 20th, 
40th, 60th, or 80th percentile of the test. On the subsequent practice tests, 
the students were reinforced for equaling or exceeding the percentage correct 
on the last test. For example, if a student's score was 15 on a 20-item test, 
i.e., 75% correct, the next test with 25 items would be assigneda "To Beat" 
score of 19 (also 75% correct). The average number of bonus points earned by 
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students and the frequency with which students earned no bonus points can be 
used as an approximate indicator of whether the procedure was working. During 
the project, students earned an average number of 3.8 bonus points per 
practice test. 

Pilot testi ng. Before implementing the reinforcement component in the 
Experimental Group I classrooms, a pilot test of the procedures' was conducted 
in the four pilot classroom sites in Logan District. The procedures were 
observed by project staff and found to be executed as intended. Thus, the 
procedures were implemented as originally planned in the Experimental Group I 
classes. 

Training Teachers in Standardized Test Administration 

Although very little research has been done on the effects of quality of 
standardized test administration and student performance on the test, much has 
been done on factors which are related to the quality of test administration. 
As discussed in the review of related literature in Chapter II, factors such 
as rapport between the test administrator and students, anxiety on the part of 
students, whether students check their work, and the type of test instructions 
given are all related to students' performance on standardized achievement 
tests. The limited research which has been done underscores the importance of 
training teachers in standardized test administration techniques. For 
example, White, Taylor, Eldred, and Carcelli (1981) observed 38 teachers 
throughout Utah as a part of the 79-80 State Refinements contract and found 
that only 27% instructed students to check their work if they finished early, 
and less than 10%. told students they should skip items that they do not know 
and go on to the next one. Even though teachers are instructed to do these 
things as a part of the standardized test Teacher's Manual , this previous 
State Refinements contract indicated that many teachers have difficulty 
following these instructions. 
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In another project, Taylor and White (1982) demonstrated that training 
teachers in test administration techniques substantially influences the scores 
received by students in those classes. Twenty-four classrooms were randomly 
assigned to an experimental group (classes in which teachers were specifically 
trained by the researchers in standardized test administration techniques) or. 
control groups. Students in the 12 experimental classes scored approximately 
1/2 standard deviation higher on, the standardized achievement test than 
students who took the test from untrained teachers. 

Materials utilized in the current project to. train teachers in 
standardized test administration techniques were based on the materials from 
the Taylor and White (1982) project. Additions and refinements were made so 
that the training was more comprehensive and targeted more specifically on the 
standardized test being used by the participating districts. These materials 
were designed to provide skills to teachers in two areas: general 
standardized test administration techniques and administration techniques 
specific to the standardized test being used by each particular district. A 
brief description of the materials in each of these sections is provided 
below. 

General standardized test administration procedures . General procedures 
for administering standardized achievement tests were presented and discussed 
in a workshop at the beginning of the school year. Topics covered during this 
workshop included the purpose of standardized achievement testing, pros and 
cons of groups versus individual testing, skills students need for 
standardized testing which are not generally required in other school work, 
how to motivate children, and a general review of what is required in a 
standardized administration (additional detail on these topics is included in 
the Implemencation of Experimental Treatment section of this report and in the 
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Presenter's Guide for Training Teachers in Test Administration which is 
available from the U.S. Department of Education). 

Three primary types of activities were used during this workshop to 
stimulate discussion and present materials to the teachers. First, prior to 
the workshop, standardized achievement tests were analyzed, and items were 

selected to demonstrate to teachers the types of problems experienced by 

> 

students on a standardized achievement tests. Because of these problems, the 
test results may be less valid for estimating what the student knows about a 
particular content area. Items or examples from standardized achievement 
tests were selected to demonstrate the following skills required during 
standardized achievement tests but which are not generally required during 
reqular school activities. 

1. Selecting the "best" answer from a number of choices. 

2. Eliminating attractive wrong choices. 

3. Responding on machi ne-scorabl e forms. 

4. Responding to specialized directions. 

5. Working in a highly structured setting. 

6. Responding with the whole class. 

7. Identifying what question is being asked from a -narrative. 

8. Performing under time limits. 

9. Following advice to guess. 

10. Responding to unfamiliar figures or words. 

For example, given below is one of the items taken from a standardized 
achievement test used to demonstrate to teachers how students sometimes have 
problems responding to unfamiliar figures or words. 

iO. "HERE ARE FOUR GARDENS, ALL THE SAME SIZE. THE DARK PARTS 

SHOW WHERE POTATO ES HAVE BEEN PLANTED* * > m STUDENTS MAY THINK 
THEY ARE ON THE WRONG SET OF ANSWER CHOICES. 
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To demonstrate how children can sometimes know the correct answer to\the 

\ 

content being tested but respond incorrectly on the test item, examples of \^ 
problems were shown taken from the study by the Huron Institute where children 
were asked to explain why they had selected answers to questions. Given below 
is one of those examples. 

Which plant needs the least amount of water? i 




When asked why she answered "cabbage", the child responded, "The cabbage' needs 
the least water because it only needs water when you clean it." In other 
words, the child knew that since the other two options were growing plants, 
they would continually need water whereas the cabbage (which had been picked) 
would only need water when it was cleaned. Thus 1 , the child knew the content 
but missed the item. 

Items such as those presented above were used to demonstrate all of the 
areas of skills children need in responding appropriately to standardized 
ac_b ley em ent tests, .This was done to_ help teachers understand the problems 
that students sometimes experience. It was hoped that such understanding 
would help teachers see the importance of structuring the testing situation in 
such a way that the student's knowledge of the content area is being tested 
rather than his or her skill in. taking the standardized achievement test. 

The second major activity used was a simulation activity. Teachers were 
asked to "take" a standardized achievement test consisting of one item. The 
directions for administering this test are given on the next page. 
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This is a tost item to be .nlministuieti to participants . Since 
this oxi. rcir.c provides i'xpt:rienCt*y. th.it j * !.*;tiuf «? point s, dis- 
cussed in the workshop nttd stiiunljtc much discussion , it is 
considered an important activity and ultould not be omitted. • 

Ouirklu. r.-.iri those directions to the test in a mono t ant: ;tnd 
quiet voi:e while you A i •(.■/; your c on this pap.-r. Mut'e 
right into the test. Don't wjit for participants to yet 
oriented. Hand the following exact Jy as written. 

"Turn to HO/A. This is a hard test so listen up. Fill in the 
space above the correct answer to this question. You may not 
make any other marks on the paper. Which one of these v/ould be 
the cheapest to buy? Listen carefully and I will tell you about 
what they cost. The hyperbola costs more than the triangle, the 
triangle costs tho same as the plus, and the plus costs more than 
the square. Mark the cheapest one, the one that costs the least. 
Pencils down. Turn your papers over." (Correct answer: Square) 

DO NOT TELL THE ANSWER UNTIL YOU HAVE GIVEN THE ITEM TWICE. 
Procedc in this manner. Ask the first question below to generate 
discussion . Keep it brief and encourage one sentence responses . 
They will have much to say but try to yet answers to questions 
2 through 9 if thsy are not covered in the discussion. 

1. 
2. 
3. 
4. 
5. 
6^ 
7. 
8. 
9. 
10. 



What did I do wrong? 
What was the question? 

How many times did I read the question? (2) 
How did you place the paper in front of you? < ,/ 
What is a hyperbola? 

Do you need to know what a hyperbola is to answer the question? 
Did you stop listening after hearing "iiyperbola"? 
What strategy did you use to figure out the answer? 
How many answered the question?. 

in this case, how rcuch control does a test administrator 
have over the test results? (100X. Since virtually no one 
will get the item correct, but they do know the°content, 
the examiner had control.) 
11. This is a real test question, only the figures have been 
changed. The wording is otherwise untouched. What grade 
level do you think it is? (2nd) 

Readminister the item but follow correct test administration 
practice . 

1. Postive verbal reinforcement before and after testing. 

2. Preparation of examinees (demonstrate how K to turn paper 
upside down and fill in the answer forms). 

3. Look at examinees at the end of each sentence to assure that 
they are responding correctly. 

4. __ Pause at the end of each sentence. 

After reading the item, talk about anything the participants wish 
to discuss, but don't vojunteer the answer. When someone finally 
asks for the answer, tell them that the rule is you can't tell 
them, marking the point chat we don't tell students the answers. 
Then tell them the answer, ask how many got it right, and reward 
everyone for trying. Trhoughout the workshop, the presenter may 
use this test experience to illustrate other testing events (that 
create problems for students. / 



After taking this test, participants were asked to discuss the experience 
that they had just had. This item generated a great deal of discussion about 
the proper and improper ways to administer standardized tests. Participants 
were particularly impressed to learn that the item is taken from a second 
grade test in which symbols instead of names of toys were used so that 
teachers rpight be unfamiliar with some of the lanquage (in the second grade 
test, a teddy bear, roller-skate, football, and doll are used instead of plus, 
square, tri angl e,. and hyperbola). 



best copy h'mim 



\ 
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The third activity consisted of critiquing a videotape developed by the 
project. This tape was based on a videotape developed during the 79-80 State 
Refine~ents contrac . Th-^ :.5pe showed scenarios- of standardized testing done 
1 " cc rect ncorrectlv jthough the scenarios on the videotape were , 

^d , r of ::he sc^' ncluded on the videotape had actually been 

observed in the classroom. Teachers were asked to identify the correct and 
incorrect test administration procedures being done. 

Administration Procedures Specific to a Particular 
Standardized Test 

Shortly before the spring administration of the district's standardized 
achievement test, a second workshop was, held in which teachers were provided 
additional training in administering the particular standardized achievement 
test being used by their district. The content of this workshop reviewed the 
general procedures for standardized test administration and then focused on 
^ the procedures for the particular test being used in that district. The • 
review of general test administration procedures presented material taken from 
standardized test administration manuals and encouraged discussion from 
teachers based on their experience administering the practice tests during the 

year. " > 

Even though all of t hese_ teachers had previously administered 
standardized achievement tests, in their classroom, it was hoped that the . 
combination of the workshop in the fall and the experience of administering a 
number of practice tests during-the year would have sensitized them to a 
number of important issues about the administration of standardized 
achievement tests. For example, included in this discussion were issues such 
as student seating arrangement for testing, how to prepare for early 
finishers, clarifying ambiguities in the directions, and facilitating a 
supportive atmosphere for testing. A more detailed description of the types 
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of materials covered in this workshop is included in the Implementation 
section *of this report or in the Presenter's Guide which is available from the 
U.S. Department of Education. 

Material presented in this workshop was developed based on the project 
staff's analysis of standardized test adminilffr^^ and their 

identification of areas which might cause some students to score lower on the 
test than would be accurate based on what they knew about the content area. 
The main learning activity consisted of the teachers in the group alternating 
in the role of the test administrator with portions of each subtest while the 
other teachers acted as "students". These roles were alternated so that each 
teacher had an opportunity to participate* several times as a test 
administrator. Following each section, the group would discuss how the test 
was being administered, provide suggestions for improvement, and identify 
areas that might cause problems for students. 

Summary 

The purpose of this training in test administration was to sensitize 
teachers to the problems which students have during standardized test 
administration, to suggest the reason for many of those problems, emphasize 
how UrtJse problems might result in test scores being an inaccurate reflection 
"oT^what the student knows about a particular content area, and to train 
teachers in techniques for substantially reducing or eliminating those 
problems. By focusing on examples from actual standardized achievement 
testing, simulated experiences for the teachers, and the videotape of test 
administration scenarios, an. effort was' made to make these points interesting 
and as "real life" as possible. The interactive nature of the training was an 
intentional part of the design, for although all of the teachers had 
previously given standardized achievement tests, almost none of them had done 
so in a situation where they could get feedback from others about their 
administration" techniques. 
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Sample for Research 

Potential participants in the research project were selected from three 
school districts located in central and northern Utah: ^Granite District is 
located in Salt Lake City, Nebo District in , the south end of Utah County, and 
Cache District in Cache Valley (see Table 17 for a description of the 
districts). These districts provided an appropriate accessible population 
because they serve a large number and wide variety of Title I second_ graders 
and were accessible to the project base in Logan. Alpine District,. originally 
proposed as a project site, was not included in the sample because an adequate 
number of teachers were available in the other three districts and the project 
logistics were simplified by working with three instead of four districts. 
The original sample contained 22 schools, 61 classes, and 1,448 students (see 
Table 18). One Cache Valley school, with two Experimental Group II classes, 
left the study in March due to unscheduled demands on the teachers' time; and 
I one teacher in an Experimental Group II school, in Granite District was dropped 
from the project in early February due to ill health in her family. This 
attrition resulted in a final sample of 21 'schools with 58 teachers and 1,373 
students. Experimental Group I had 21 classes and 522 students; Experimental 
Group II had 17 classes and 412 students; and the control group had 20 classes 
and 439 students. The process of determining the sample and the procedure for 
assignment to experimental groups is described below. 

Identification and Selection of Sample 

The process of selecting the participating districts began with an 
informational meeting which was held in May, 1981. District Title I 
coordinators from the Salt Lake and Utah Valley areas were invited to the 
meeting. Coordinators from 10 districts attended the meeting. Topics 
discussed during the meeting included previous research on standardized 
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Table 1? 

Description of Districts Participating In Project 



Type of Breakdown arid 
District Geographic Number of Enrol lment 
Area Schools 



Hebo Rural 19 ■ Elementary 7,224 

3 - Middle 1,798 

3 - Junior High 1,744 

, 4 - Senior High 2,417 
.. 29 - Total - 13,193 



Granite ' City 



58 - Elementary 36,333 

14 - Junior High 13,118 

8 - Senior High 12,376 

80 • Total 62,827 



Cache Rural 10 ■ Elementary 5,448 

2 - Junior High, 1,794 

1 - Senior High 1,62? 

13 - Total 8,869 



American 
Indian 



T 



63 



197 
.3* 



38 



73 



234 
.41 



Ethnic Hake-Up 



Hispanic 



103 
.81 



2.0J 



14 
.21 



T 



81 
.61 



1198 1175 564 



1.91 



23 
.3* 



Asian 



rrr 



13 
.IX 



.91 



32 
.41 



14 
.11 



531 
.91 



16 
.21 



Black 



ffTT 



.21 



134 137 



.2? 



Hhi 



6341 



V 



48.11 49.41 



\ 



4325 
48.81 



6521 



20824 27651 
45.91 44.9* 



4008 
45.25} 



Grades with Subject with 
Title I Title I Services 



3-6 



2-3 



Reading only 



K-3 Reading 
-7-9 Matih 
(1 school only) 



Reading Only 



Note: U\tn from Annua.l Report of State Superintendent - USOE 1980-81, Utah Public School System. 
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Table 18 
Experimental Sample 



Logan y 
District— Hlllcrest^ 



Larsen 

Peterson 

Olsen 



'PHOT TESTING - 



Logan Riverside Manley 

District 



EXPERIMENTAL GROUP 



Granite 
District 



.Hillsdale 



'Jensen 
■Kane 
Kunz 
N Waldram 



Lincoln ^Archer 
x Norr1s 



Granite 
District^ 



Banks 
/ Borden 
West ^ Goner 
'Kearns >^ Green 
VLobb 
Hart in 



Nebo 
District 



Cache — 
District 



'Redwood ^Crockett 
Latham 

,Santaquin <^Burb1dge 
Payne 

^Westside ^Willis 
Anthony 

< Jenkins 
Nielsen 
Hurray 



I 1981 Achievement Z Score 
X/SD * .H/,91 



EXPERIMENTAL GROUP II 



Granite 
District 



Gannon 
. Eber 

Western Tanner 
'Hills ■ N>Shepherd 
•Schmidt 




Stansbury < 



-Hunt 
-Miller 
"Wallace 
'Archer 



Granite - 
District 



Nebo .. 
District 



■South ^- Grose 
Kearns Madsen 
Franco 



.Goshen 



ONeff 
Boyack 



Cache 
District ' 



'Wilson -^Anderson 
Altenburg 

, Lewiston ^Mieurc 
Schenevar 

"Park ' "^TTaqgart * 
Talbot * 



\ 1981 Achievement Z Score 
T/SD * .06/.98 



CONTROL GROUP 



Woodrow ^ Lund 

/Wilson , Cunnings 
Granite / ^Jackson 
District \ 

Roosevelt "\Pugh 
Burton 

JeWston 

Granite — Lake Ridge ^Woodland 



District 



Spackman 



Nebo 

District \ 



-Smith 

/Taylor ^Beaudin 



' Harsen 



Nebo - 
District 



N 

BrooKside ^Hason 



Ghiradelli 
Jensen 



lee 

'Haso 
s Lee 



Sumnit 



Cache / 
District \ 



^✓Jensen 
^r-Rawliiis 
NlelWill 



Hillville <— Tuddenham 
^Noble 



T 1981 Achievement Z Score 
T/SD * ,08/1.15 



'Leftjrgjecj before completion. 
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testing, the results of the Utah 79^80 State Refinements Project, and a 
description of the proposed study. Reactions by all of the people at the 
meeting supported the value of a project such as this; and Granite, Nebo, and 
Alpine district coordinators said they would definitely like to participate in 
the project. A similar meeting was held the next week with the District Title 
I coordinators from Logan and Cache districts who also volunteered to 
participate in the project. 

After the proposal was approved in July, 1981, the coordinators of 
Granite, Nebo, Cache, and Logan districts were contacted again, and procedures 
were initiated to obtain formal district approval for participation in the 
project. District coordinators were then supplied with a letter for them to 
revise as they wanted and send to the principals of the Title I schools in 
their district. The letter explained the project and requested that the 
principals encourage their second grade teachers to volunteer for the study. 
(A copy of the letter is included in Appendix E.) A list of the principals in 
Granite, Nebo, Cache, and Logan Districts to whom the letter had been sent was 
obtained from the district offices, and a project staff member contacted each 
principal by phone to determine if they would be willing to participate in the 
project. At this time, principals wereXinformed that we did not yet know to 
which group (I, II, or control) their schobj would be assigned. This was done 
to avoid a threat to the internal validity o"Nthe study findings due to the 
experimental groups being volunteers. Because assignment to groups was not 

done until after it had been determined that all of the accessible population 

■ \ 

was willing to participate if selected, schools in the control group would be 
only randomly different from schools in the* experimental groups on the 
variable of "volunteerism" . Twenty-two of the twenty-three principal s 
contacted agreed to participate contingent on the willingness of the 
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individual second grade teachers. The only principal who declined said he 
would like to participate if he could be guaranteed a slot in the Experimental 
I group. Because this would have compromised the integrity of the 
experimental design as described above, his school was dropped from 
consideration. 

A list of the second grade teachers from the interested schools was 
obtained during the second and third weeks in August, 1981. Project staff 
contacted each teacher by phone to explain the purpose of the study, the 
procedure for random assignment of classes to experimental groups, and the 
responsibilities the teachers would have if they were selected for the 
project. Again, teachers were not told in which group they would actually be 
since assignment to groups was not done until a sufficient number of teachers 
had volunteered for all three groups. Responsibilities of treatment group 
teachers included showing biweekly filmstrips, giving practice tests to their 
students, and attending one or two workshops. An honorarium of $25 (for 
Experimental Group II) and $50 (for Experimental Group I) was given to 
teachers for participating. Teachers in treatment and control groups were 
told that observers would collect data during the spring administration of the 
standardized achievement test. Sixty-one teachers out of the 66 contacted 
volunteered to participate in the study. The reasons for unwillingness to 
participate were a lack of willingness to risk being assigned to the control 
group, previous time commitments, or health problems in the family. 

Assignment of the Sample to Groups 

Schools instead of classes (i.e., teachers) were randomly assigned to one 
of the experimental or control groups. .This assignment method ensured that 
all teachers in the same school were using the same treatment procedures and 
reduced the possibility that the treatment implementation would be . 
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contaminated by conversations and sharing of materials by teachers in the same 
school but using different "treatments". It was expected that teachers from 
different schools were less likely to share information and materials than 
teachers from the same schools. To assist in assigning schools to one of the 
three experimental groups, the previous sprinq's average achievement test 
score for the second or third grade of each school was obtained. Because the 
districts use different achievement tests, each school's score was converted 
into standard Z scores (within each district) so that each score was on a 
roughly comparable metric. The names of the participating schools were ^then 
randomly drawn from a box and assigned tc either Experimental Group I, 1 
Experimental Group II, or the control'group. After assignment, the average 
achievement Z score for each group was calculated to determine if the \ 
randomization procedure had resulted in approximately equivalent groups, hhich 
it had not. The random assignment procedure was repeated once more at which 
time equivalent groups in terms of previous year's achievement test Z scores 
were obtained (average Z scores and number of classes for each group are shown 
in Table 18) . 

During the last week in August and the first week in September, each 
teacher was phoned and informed of the group to which they had been assigned 
and the specific responsibilities they could expect while participating in the 
project. A follow-up letter was sent to each teacher confirming their 
participation in the project and their qroup assignment (see Appendix E for a 
copy of a letter). 
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The following sections discuss the procedures for implementing the 

research treatments designated Experimental I or Experimental II (see Table 19 

\ 
\ 

for the number of classes and students receiving each treatment). \ 
Experimental I classrooms received teacher training in test administration and 
student training in test -taking skills (including filmstrips, student practice 
tests, and student reinforcement for practice test performance). Experimental 
I classrooms received the filmstrips and the student practice tests. No. 
experimental treatments were applied to control group classrooms. The four 
treatments are described below in two sections: teacher training in test 
administration and student training in test-taking skills (filmstrips, 
practice tests, and reinforcement).' Table 20 displays the implementation time 
line for all components. 

Table 19 



Implementation of Experimental Treatments / 





N 


Teacher 
Training 


Student Training 


Group 


Classes 


Students 


^Test Admin- 
i strat ion 


Film- 
strips 


Practice 
Tests 


Rei nforce- 
ment 


Experimental I 


21 


522 


X 


X 


X 


X 


Experimental II 


17 


412 




"x 


X 




Control 


20 


439 











Teacher Training in Test Administration 

The Utah 79-80 State Refinements Project (1981) conduct ! that the 
procedures usod by teachers during test administration contribute to how well 
a student scores on a test. The data from that project also provided evidence 
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Table 20 



Actual Timeline for Implementing Filmstrips, 
Practice Tests, and Teacher Supervision . 



11 


























































nc 
















X. 










































ffj 
























x 


































id 


































X. 












































































x 
















ff 0 












































x 

A 






















n 






















































... 




in 

f 0 




















































x 






IQ 

IT 3 






















































x 

A 




f 1 
















X , 


































































x 


































■ #3 




































X 






























" #4 








































Y 
A 


















« 














































X 


















#6 


















































x 
































































(... 


TEACHER SUPERVISION 
Train Experimental I 




X 

A 






















































Train Experimental 11 






v 
X 




















































Ull" J) lie rlUUL \ 


























































On-site Visits 


























































Phone Visits 


























































Group Meeting 



























































Note. X a deliver or mail materials. 
. - implementation. 



ERJC ,' • 

-bl ■ 157 



137 



that teachers trained in proper standardized test administration had much 
higher levels of on-task behavior and quality of test administration than did 
untrained teachers, and. students in the classrooms with trained teachers made 
significantly fewer errors in completing their test booklets. 

This project, building on the results from the previous project, 
developed, implemented, and evaluated the effectiveness of. a more extensive 
and praqmatic program designed to increase the quality of test administration. 
The program incorporated not only general test administration techniques but 
procedures specific to the actual standardized achievement test used by each 
district as wel 1 . 

Only those 21 teachers assigned to Experimental Group I participated in 
the program for training teachers in standardized test administration. This 
training was presented in two structured workshops: the fall workshop was 
conducted in September at the beginning of the project, and the spring 
workshop was prior to the districts' spring achievement testing (see Table 21 
for a breakdown by district). 

Table 21 \ 

Training in Test Administration 
Breakdown by District 



Workshop • 
Fall 

(Make-up) 
Spring 



District 

Cache 

Granite 

Nebo 

Granite 



Cache 

Granite 

Nebo 



JN Date Duration 

3 September 12, 1981 2 hours 
11 September 12, 1981 2 hours 

4 September 12, 1981 2 hours 

3 September 19, 1\981 2 hours 



3 
14 
4 



March 11, 1982 
March 12, 1982 
March II, L98Z 



3 hours 
3 hours 
3 hours 
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the goals, agenda, implementation procedures, and teacher evaluations of 
each workshop are described below. 

Fall workshop . The fall workshop was presented by five project staff in 
Salt Lake City on September 12, 1982, from 9:00 a.m. to 4:00 p.m. The three 
absentees, all from Granite District, participated in a make-up workshop on 
September 19, 1982 (see Table 21). 

The primary purpose of this workshop was to train Experimental Group I 
teachers in the general procedures of proper standardized test administration. 
It was conducted in the fall for two reasons. First, techniques presented 
enabled the teachers to practice proper test administration procedures while 
administering the seven student practice tests described earlier. Secondly, 
other project-related information concerning the student training materials, 
the purpose of the research, and other logistical information needed to be 
given to teachers at the beginning of the project. Because this workshop was 
already scheduled, it was a natural time to include the training in test 
administration as one part of the workshop. 

The workshop objectives which pertained to training teachers to 
administer standardized tests were as follows: 

Participants will be able to: 

I 

1. Identify testing problems unique to the school district. 

2. Differentiate behaviors required of teachers and students during 
testing from behaviors exhibited durinq the regular instruction. 

3. List, motivational , test-taking, and test administration practices 

that increase the validity of test results. 

t 

4. Produce 

a. a list of potential test-taking reinforcements. 

b. a statement of testing purpose for explaining to students. 
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5. Practice 

a. taking a test. 

b. teaching test-specific directions. 

c. completing a checklist of appropriate test administration 
■• practices. 

d. using the Teacher Index to Valid Test Performance. 

6. Identify correct and incorrect test administration practices in 
videotaped classroom testing scenes. 

A brief summary of each agenda topic is provided below (for more complete 

information, see Presenter's Guide for Training Teachers in Test 

Admini stration ) . 

I. Introduction : Participants identified testing problems, took a 

simulated test, and filled out the Participant Inventory so they could 
assess their ov/n pre-workshop knowledqe of proper test administration 
(see Presenter 1 s Guide ) . 

II. Valid Test Results : The goals of achievement testing and the concept 
of validity were discussed. Factors that , contribute to low test 
scores were presented as well as the advantages and disadvantages of 
group testing. 

III. Motivation : Techniques to structure the environment to encourage 
students to try their best were presented and discussed. 

IV. Test-Taking Skills : Student skills required during test taking but 

which are not generally required during regular school activities were 
explained and simulated with actual achievement test items. ' 

V. Test Administration : Techniques for obtaininq more valid results were 
presented. The teachers practiced these procedures while administer- 
ing sections of a standardized achievement test to each other. The 
Quality of Test Administration Checklist .(located in the Presenter's 
- Guide ) was presented and discussed. The Teacher Index to Valid Test 
Performance form (located in the PresenterVGuide ) to document 
disruptive events that may occur during testing was explained. 

VI. Videotape Observation : A videotape developed during the Utah 79-80 

State Refinements Project (1981} was shown to illustrate the effect of 
various test administration procedures on student behavior. Scenarios 
depicting both correct and incorrect test administration techniques 
were critiqued by the participating teachers. The following testing 
activities were shown in the videotape: preparing students for the 
test, arranging the testing room, di stributing the test materials, 
giving directions, monitoring students, using an aid, providing 
assistance to the students, pacing, and obtaining group responses. 
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VII. Summary : The teachers took the Participant Inventory again so they - 
could assess the degree to which they had acquired the skills and 
information presented to them during the workshop. 

VIII. Feedback and Written Evaluation : An evaluation form was 

distributed to all the teachers and collected at the end of the 
workshop. Results, shown in Table 22, indicate that the workshop was 
very successful in meeting the objectives of the project and the 
perceived needs of the participants. 



Spring workshop . Three spring workshops, oae in each participating 
district, were presented oh March 11 or 12, 1982 (see Table 21). Separate 
workshops for each district were held by project staff in the schools to 
enable the teachers to attend right after school . The workshop was conducted 
in the spring to increase the likelihood that the information provided; would 
be.'necalled and used by the teachers when actually administering the 
districts' standardized achievement tests in April. Each workshop was 
approximately 3 hours long. There were no absentees during those workshops. 

The primary purpose of the spring workshop was to train Experimental 
Group I teachers in the specific test administration procedures relevant to 
the district-adopted achievement test they would be administering to their 
students in Apri 1 . 

The workshop objectives were as follows: ' 

Participants will be able to: 

1.. Administer the publisher's practice test using proper test 
administration techniques. 

2. Administer the standardized achievement test to their students 
with proper test administration.. 

Items from the spring workshop agenda are summarized below. (A copy of 

the spring workshop materials is included in the Presenter's Guide .) 

I . Things to Do : This topic included specific activities for the 
teacher to do before the testing date, just before testing, 
. during testing, and after testing. 
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Wo r k s h o p_ E v a I u ation Form 



September 12, 1981 Salt Lake City 



Date 



Location 



N = 19 



I. EVALUATION OF WORKSHOP STAFF 



KNOWLEDGE OF SUBJECT 
MATTER 



1 9_ Very well informed 
Adequately informed 

Not well informed 

Very poorly informed 



ATTITUDE TOWARD SUBJECT 



19 Enthusiastic 

Rather interested 

Routine interest 

Disinterested 



ABILITY TO EXPLAIN 



17 Clear and to the point 
2 Usually adequate 

Somewhat inadequate 

- m Totally inadequate 



LEVEL OF PRESENTATION 



15 Very well suited to 
partici pants 

4 Hoderately well suited 
to participants 

Completely above 

participants 

' Completely below 
participants 



ATTITUDE TOWARD 
PARTICIPANTS 



16 Very helpful and 
understanding 

2 Interested 

1 Routine, neutral 

Olstant, cold, aloof 



METHOD OF PRESENTATION 



6 Ingenious, creative 

13 Interesting, held attention 

Somewhat monotonous 

Uninteresting, boring 



OPPORTUNITY FOR DISCUSSION 



Too infrequent 

18 Appropria te 
1 Too frequent 



OVERALL RATING 
OF WORKSHOP STAFF 



14 Outstanding 

5 Better than average 

Average 

Below average 

Poor 
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EVALUATION C^Q^ai^ COffTEf fT AND FOPMAT 



The objectives of the workshop were clear from the beginning ^ 

The balance between lecture and participant interaction in the workshop was ideal' 

The workshop material contributed well to our overall goals and objectives. . . . 

The workshop was well structured and organized. 

The content of the workshop was presented in a clear and understandable manner. . . 

The scope and coverage of this workshop was appropriate ..... . . 

Content, was summariied well and major points were easy to identify 

The value 1 derived from this workshop was well worth the time required of 

me to participate 

The workshop provided specific guidance and ideas which I can apply in my 

job responsibilities. . 

The total length of the workshop was appropriate 

Workshop arrangements (location, rooms, prior information, schedules) / 
' were adequate , • 

BEST CQrX A*>**mH£ - 



• — <$j 

C cn 
O «a 





Frequency 






0 


0 


1 


9 


9 


4.42 


0 


0 


0 


16 


3 


4.16 


0 


0 


0 


8 


11 


4.58 


0 


0 


0 


7 


12 


4.63 


0 


0 


0 


9 


10 


4.53 


0 


0 


2 


9 


• 8 


4,32 


0 


0 


0 


13 


6 


4.32 


0 


0 


1 


10 


8 


4.37 


0 


0 


0 


8 


11 


4.58 


0 


3 


2 : 


14 


0 


■3,58 


0 


2 


2 


14 


1 


3.73 



III. OVERALL EVALUATION 



Table 22 (continued) 
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OVERALL RATING OF WORKSHOP 



13 Outstanding 

, 6 Better than average 

m Average 

Below Average 

Poor 



Specific points which were valuable or significant to me were: 
(list at least two) 

Re i n f o r ceme n t/mo t'i va t i o n 10 

Videotape on test administration 1 

Practice tests/group response . 4 
Introduction to test^taking/ 

examples of difficult items .9 

Good visual aids 1 ^ 

Other uses for test skills 1 *** 

Filmstrip 1 

Role playing of students 1 

Good workshop staff * 1 

The* workshop would have been more valuable to me if: 

(list* at least two, particularly refer back to items 
you rated low in first two sections) 

Split to 2h days ' 2/ 

If Td had a choice about / 

participating 1 

Too warm « A 

Closer with less travel 1 

Practice test was too. long 4 

Shorter lunch / 2 

Shorter workshop / 2 
Listing do's and don'ts on videotape / 1 

Nothing * 



3 



ERLC 



4. If you had to shorten this workshop byjh hour, what would you delete? 

Nothing 2 

Going through practice test / 4 

Generally condense / 2 ;t 

Practice direction giving „ 7 

Gotten lunch orders at first 1 

Percentages about student and 

teacher performance / 1 * 

1st group question-answer period / 1 Ifi 1 } 

ZH day sessions /. 1, xuo ^ 
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II. Understanding the Nature and Purpose of the, Test : The purpose 
of standardized testing, assumptions of the publisher, and type 
of subtests in each .test was presented and discussed. 

I ll . Schedule : Strategies for tinning, breaks, avoiding testing days 
close to holidays or special events, and use of the school day 
were presented. 

IV. Use of Proctor or Aide : Proctor/student ratios, classroom manage-, 
ment, and test management with the use of a proctor or aide was 
presented. 

V. Informing Students and Parents of Impending Test : Procedures for 
~ informing students and parents about the testing schedule, what will 

be tested, how the results will be used, special preparation for the 

student, and student concerns were discussed. 

VI . Seating : The use of separate desks, proper desk positioning, and 
teacher contact during the test was encouraged. 

VII . Early Finishers : Teachers were instructed to remind students to 
~ check their work and provide a nondisrupti ve task, such as , 
drawing, for early finishers. 

VIII. Eliminating Distractors : ^Teachers were warned of potential 
detractors with suggestions on how to minimize them. 

IX. Facilitating a Supportive Atmosphere : Student anxiety about 
test-taking was discussed with suggestions on how to create a 
supportive atmosphere. 

X. Reading Directions Carefully/Clarify Ambiguities : Proper 

procedures forireading directions were outlined. Teachers were 
informed of the extent to which they may add directions for the 
purpose of 'clarification, or otherwise assisting the "students . 

XI. Monitoring Students : Unobtrusive and supportive ways of 

monitoring students were presented to prevent cheating,/ discourage 
.random guessing, and prompt dawdlers. 

XII. Answering Student Questions : Teachers were informed of the 

benefits of responding to student questions about specific test 
items after the test is completed. Suggestions about managing 
such a classroom discussion were provided. 

XIII. Preparation of the Test Booklet for Scoring : Teachers were 

instructed about responsibilities such as erasing extraneous u 
marks on student booklets, darkening circles that were filled in 
too lightly, copyinq over tests that were ripped, and situations 
which .may necessitate invalidating a subtest. 

XIV. Use of Valid Test Performance Index : A rationale for the use of 

the Valid Test Performance Index to document disruptive events during 
testing were provided. Teachers were instructed on how to use the 
Index. 
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XV. Practice/Review of Standardized Achievement Tests : Each teacher 
was provided with .anvanalysis by subset of , the test they would be 
using. The analysis included a description of 'each subtest, test 
v-ocabulary, and time limits, and notes for gi ving""directions keyed to 
specific items on the test (see Presenter 's Manual , Spring Workshop 
Materials). They practiced administering selected items from each 
subtest in- role play situations>H^ 

XVI. Practice/Review of Publisher's Practice Test : The rationale for 

using the publisher's practice test and strategies for using it to 
its optimal benefit were provided. - The' teacher practiced ^ 
administering the practice test in role play situations'.. (See 
Spring Works+iop Materials, Presenter ! s Guide ) . 

XVII. Feedback and Written Evaluation : Written documentation of x the 
teacher's evaluation of the workshop was obtained on the Final 
Project Evaluation Form (see Table , items 36-39). 

Student Training in Test-Taking Skills 

This section discusses the implementation of the three student tt^iYiing 
components described earlier: filmstrips, practice tests, ancTVeinf orcement 
procedures. Since the implementation of the three components is so 
interrelated, activities' are presented chronologically : and refer to both 
Experimental Groups I and II classrooms except for the reinforcement 
procedures or where^otherwise indicated. 

Training teachers to implement student training components . To train 
Experimental Group I and Experimental Group II teachers to implement the 
student training components, two workshops were 'conducted in fall, 1981, by 
five project staff. Eighteen Experimental Group I teachers were trained to 
implement the Treatment I components (filmstrips, practice tests, and 
reinforcement procedures) in conjunction with the Test Administration Workshop 
on September 12, 1981. Twenty Experiment al 1 Group II teachers were trained to o 
use the filmstrips and practice tests at a workshop held in Salt Lake City on 
September 19, 1981. Each workshop was four hours long. Three Experimental 
Group I teachers from Granite District who- could not attend the workshop on 
September 12 were trained on. September 19 with the Experimental II teachers 
(see Table A 23 for a breakdown). 

er|c - ' 165 
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Table 23 

Workshop in Student Training Implementation 
- Breakdown by District 



Experimental Group 


u i s tr \ c l 


IN 


UdLc / 


n i i v a +* "inn 


i 


Cache 

Granite 

Nebo 


3 
11 
4 


September 12, 1981 
September 12, 1981 

September 12, 1981 . 

/ 


4 hours 
4 hours 
4 hours 


(Make-up) 


Granite 


. 3 


September 19, ,1981 


4 hours 


II 


Cache 
Granite / 
Nebo / 


4 
12 
4 


September lp/ 1981 
September 19, 1981 
September 1.9, 1981 . 


4 hours 
4 hours 
4 hours 



-* Nbte . Three Experimental Group II teachers left the project before 
completion (two from/Cache and one from- Granite) . 
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There were three goals' for the Student Trainirig Materials 

| / 

.Implementation Workshop: 



To train teachers in the use of the student training components to 
which they had b^en assigned . / 



To train teacher^ in the documentation and communication procedures 
necessary for project operation'. / 

To schedule the Student training dates/and collect the curriculum 
information necessary to develop the practice tests. 



2. 
3. 



A brief summary of each agenda topic is presented below. 

i \ - i 

I . ; Overview : The findings of the Utah ^9-80 State Refinements Project, 

! the research objectives and outcome pleasures for this study, and a 

ibrief introduction to the treatment components were presented. 



II 



IBasic Instructional Philosophy and Procedures : The rationale for 
using a direct instructional approach was discussed and the 
procedures (model, lead, test, and /correct) were explained. 



III. Plan for Student Training : The schedule for implementing the student 
training components throughout the/ year was presented, 

IV. hilmstrip Training Package : The jnteracti ve format of the 

fji lmstrips , topics covered in eacli filmstrip, and workbook activities 
were explained. Segments\of Fi lmstrips #1 and #2 were shown to the 
teachers as they played the role /of second grade students. This' 
illustrated the procedures\ necessary for the proper implementation of 
/ the filmstrip package. 
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V. Pract ice Tests : The rationale for training students in test format 
and the procedures for administering the practice testSowere 
explained. The teachers role-played second grade students as they 
took a sample practice test. This illustrated the proper administra- 
tion procedures for the use of the practice test component. 

VI. Reinforcement Procedures (Experimental Group I teachers only): 
Teachers were presented with the rationale for using the 
reinforcement procedures. Implementation of the procedures including 
scoring of the practice tests, training the students to calculate the 
bonus points they earned, and using the reinforcement chart were 
explained. The teacher went through the procedure as they 
role-played second grade students. 

VII. Communication Procedures : The procedures for returning the biweekly 
tests, updating project staff about reading curriculum progress, 
maintaining accurate records of attendance on the appropriate form, 
phone consultations, and on-site visits were explained. 

VIII. Yearly Scheduling : Teachers and project staff scheduled their first 
filmstrip and on-site visit and outlined the expected curriculum 
progress for the year. Contact logs to document the communication 
between the teachers and project staff were presented. 

IX: Feedback and Written Evaluation : An evaluation form was distributed 
and completed by all the teachers. Since the September 12, 1981 
Student Training Materials Implementation Workshop was held for 
Experimental Group I teachers concurrently with the Test Administra- 
tion Workshop, the results of both workshops were simultaneously on 
the same form and are reported previously in Table 22. The results 
of the September 19, 1982 workshop are presented in Table 24. 
Findings indicate that both workshops successfully met the goals. . 

Teacher's Manual . In addition to the workshop training,, a Teacher's 

Manual was developed to provide the participating districts with all the 

materials needed to implement the student training (with the exception of-. . 

filmstrips and tapes which were included in a separate package). The manual, 

How to Take Tests—Team Teaching with Professor Owl , includes all the written 

■ * ; i 

student training curricula produced for the project and the rationale 
4^pporting the format and content used. It is arranged in three sections: 
Filmstrips, Practice Tests, and Reinforcement and provides instructions for 
using the material, master copies of consumable items, and supplementary 
activities for review. 



Table 24 

Teacher Training in Student Curriculum Workshop 

Workshop Evaluation Form 



147 



September 10, 1981 

Date 



State Office of Education 



Location 



N 18 



EVAURTim OF VJDRKSHOP STAF 



KNOWLEDGE OF SUBJECT 
HATTER 



ATTITUDE TOWARD SUBJECT 



ABILITY TO EXPLAIN 



LEVEL OF PRESENTATION 



1 ? Very wcl 1 informed 

5 Adequately informed 

Not well informed 

Very poorly informed 



16 Enthusiastic 
2 Rather interested 

Routine interest 

OH interested 



li Clear and to the point 
7 Usually adeouate 

Somewhat inadequate 

Totally inadequate 



1 5, Verv well suited to 
pa rticf pants 

3 Moderately well suited 
to participants 

Completely above 

participants 

Completely below 

participants^ 



ATTITUDE TOWARD 
PARTICIPANTS 



1 5 Very helpful and 
understanding 

3^ Interested 

Routine, neutral 

Distant, cold, aloof 



METHOD OF PRESENTATION 



Ingenious, creative 
12_ Interesting, held attention 

Somewhat monotonous 

Uninteresting, boring 



OPPORTUNITY FOR DISCUSSION 



Too infrequent 

17 Appropriate 
J Too frequent 



OVERALL RATING 
OF WORKSHOP STAFF 



]_Q Outstanding 
ft Better than average 

_ Average 

Below average 

Poor 
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Frequency 




X 




0 


0 


0 


12 


6 


4.33 


• The balance between lecture and participant interaction in the workshop was ideal . . 


0 


0 


1 


11 


6 


4.27 




0 


0 


1 


9 


7 


4.35 




0 


0 


1 


10 


7 


4.3.3 


• The content of the workshop was presented in a clear and understandable manner. . . . 


0 


0 


1 


11 


6 


4.27 




0 


0 


0 


11 


7 


4.38 




0 


0 


0 


12 


5 


4.29 


• The value I derived from this workshop was well worth the time required of 


0 


0 


5 


7 


6 


4.05 


• The worteMtip provided specific guidance and fdeas which I can apply in my 


0 


0 


4 


9 


5 


4.05 




0 


0 


3 


10 


5 


4.11 


• Workshop' arrangements (location, rooms, prior infoiTnation, schedules) 


0 


0 


0 


16' 


2 


4.-11 



~ 168 BEST C0?Y AVAILABLE 
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III. OVERALL EVALUATION 



OVERALL RATING OF WORKSHOP 



8 Outstanding 



10 Better than average 

Average 

Below Average . 

Poor 



2. Specific points which were valuable or significant to me were: 
(list at least two) 

Simulated test item 1 
Filmstrip presentation 5 
Sample test 1 
Demonstrate with children (so they 

see it done) 
All materials prepared for teacher 
Standardized testing is not a part 



of teacher training 3 

Statistics (of test formats) 3 

Role of teacher in testing 1 

Answered questions of teacher role 

in project 3 
Help students 



3. The workshop would have been more valuable to me if: 
(list at least two, particularly refer back to items 
you rated lW; in first two sections) 
More input on filmstrip and prep. 1 
Coffee . 

Hard to give up a Saturday. Not sure 3 
of work involved in project. 
More ?'s on group standardized 
testing. 

Implement concepts. 
Baby was distracting. 



169 



149 



The manual was developed and added to as the student training components 

were produced. The 21 Experimental Group I and 17 Experimental Group II 

teachers received a large three-ring notebook with labeled dividers to bind 

and orqanize the material which was sent to them with each filmstrip and 

practice test. The materials were hole-punched and ready for inserting into 

the manual. A listing of the materials in each section is provided below. 

I . Introduction : Organization of the manual and materials. 

II. Filmstrip^ : 

General Information—rationale for student training. 
Teacher/Fi lmstri p--interact ion of filmstrip instruction with teacher 
behavior. 

Instructional Sequence—explanation of direct instruction strategy. 
General Instruction—tasks required to show filmstrips. 
9 Filmstrip Scripts— for teacher preparation and for the 

projectionist to use in turning the frames. 
9 Masters for Work Bookl ets— for duplicating student practice 

i. ater i al . 

Practice and Review Sheets—laminated .or master copies for 

supplementary activities to use as needed or just for fun. 

III. Practice Test Section : 

General Informat ion— a rationale for training students in test 
format. 

Construction the Practice Tests— expl anation of the develo, nt of 
the te 

Procedures—now to use tests properly. 

Direct! ons— expl anation of the individual test directions. - 

General Procedures—instructions to the teacher for administering the 

> practice ^ests . 

Scoring Procedures— directions for instructing the students how to 

score the practice tests. 
Laminated Scoring Cards— for giving students examples of scoring. 
7 Practice Test Masters— to duplicate for distribution to students. 
7 Practice Test Directions— individual test directions to direct 

students through each test. ti j 

IV. Reinforcement Section : 

Motivation Program— expl anation of the rationale for the program and 

procedures for implementation. 
Charting Instructions—specif ic instructions to teach childr o ise 

program. 

Sample Char or explanation to students. 
Master Chart—for reproduction as needed. 

Laminated Chart— for teaching students how to fill in graphs and 
calculate* points. f 
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Typ i c al impl ement at i on . The typical cycle used to implement student 
training proceeded through teacher preparation, showing the filmstrip, 
practice test administration and scoring, reinforcement implementation, and 
return of tests to USU. The details of each activity are explained below. 

Upon receiving the classroom materials from USU, the teacher would 
prepare the lessons and schedule related activities to occur within two weeks. 
The average time for teacher preparation for conducting a typical filmstrip 
and practice test (according to self-report information) was 11.9 minutes and 
included the following: 

« 

1. Duplicate extr j practice tests and filmstrip booklets from the master 
copy for any new students. 

2. Arrange for someone to turn the filmstrip projector. 

3. Read script accompanying filmstrip (optional). 

4. Read test directions (optional). 

5. Post or copy for each student the review charts. 

6. Position filmstrip projector and tape recorder in room. 

To implement the filmstrip and tape lesson, the teacher would first, use 
review charts to prompt students on concepts taught in previous filmstrips. 
(See the review charts that accompanied each filmstrip lesson in the Teacher 1 s 
Manual . ) After a short review of 2-5 minutes, the teacher would pass out 
individual student booklets and start the tape while the filmstrip turner ran 
the projector. Most of the instruction was delivered to students via the 
filmstrip, but the teacher could control the pace to the degree she/he wanted 
to and would personally direct the class for three types of exercises: 

1. when asked by Professor Owl to teach or quiz the students on a 
difficult concept, 

2. to supplement the filmstrip instruction when students were having 
difficulty understanding, 

3. to supervise students as they worked through practice items in their 
booklets.. /' 
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Following the filmstrip, students could take their booklets home. The 
practice test was given on a different day from the filmstri.ps. Although 
there was not a one-to-one correspondence between the objectives of a 
particular filmstrip and the following practice test, the tests were usually 
scheduled in between two fi lmstrips . Typically, the practice test would be 
given one or two days following a filmstrip. 

Before administering the practice test, the teacher was encouraged to 
review previously taught test-taking skills. After passing out the 
individually identified tests to the correct students, the teacher would begin 
reading directions and giving the test. The directions were structured so 
that all three levels could be givtn at the same time, yet students would not 
realize that tests differed depending on the reading level. The length of the 
test and the time allowed for completion increased with each practice test (5 
minutes on test #1 to 30 minutes on test #7). 

Immediately following the' test, red pencils, supplied by the project, 
were passed out to the students. Black lead pencils were put inside desks. 
Scoring directions were reviewed according to the needs of the students. As 
the teacher read the answers, students marked their own papers. Then the 
number of correct answers were tallied and placed in a box marked "score" on 
the test cover. The mean percent correct for all practice tests was 82.75 
(SD = 14.72). < " . 

In addition to viewing, fi lmstrips and taking practice tests, Experimental 
Group I students participated in the reinforcement procedures. They were 
verbally encouraged to try their .best to score high on the practice tests and 
beat a cut-off score assigned to them based on a previous test score. Rein- 
forcement procedures were initiated immediately after the students scored 
their tests. The front cover of the -individual practice tests contained' three 

c 

labeled boxes (see the Teacher's Manual for the cover sheets): "SCORE", "TO 
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BEAT", and "POINTS". Each test contained an individually predetermined "TO 
BEAT" nun. " ,r score on the previous pr/r' ire tp r ^ . After 

filling in t.. .. -..wia" ooxes, students would compute their "POI "S" by 
subtracting "TO BEAT" from "SCORE". Assistance in computation was given by 
the teacher for students who could not subtract. The mean number of 
reinforcement points earned per student per practice test was 3.8 (SD = 1.4). 

Students obtained their reinforcement charts from the reinforcement 
stand and copied the data from test to chart. Charts were graphed and shown 
to the teacher, who was encouraged to praise the students for progress. 
Students then returned the chart to the stand for public display and gave 
their tests and red pencils to the teacher. 

Teachers were instructed tp record the names of students who were absent 
durinq the filmstrips and practice tests. When convenient, absent students 
"made up" the test^and filmstrip. Tests were then mailed back to the project 
for analysis and filmstrips were either passed on to another teacher or stored 
for later use. 

. Throughout the project year, contact with teachers was 
maintained by USU staff through classroom and phone visits. Table 25 displays 
the frequency and types of USU-teacher contact'. The interactions with 
teachers served two purposes: 

1. To support and reinforce the teachers during their facilitation of 
project' components . A higher degree and quality of implementation 
was oxpected frora teachers who were contacted and rewarded 

, i equent ly . ? 

2. To correct problems and modify implementation strategies to fit 
unique situations. The most efficacious method to ensure Proper 
implementation of a program is to stop misconceptions at the 
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inception, model the correct procedure, and maintain frequent 
follow-up. 

Table 25 

Number of Contacts Between Project Staff 
and District Staff 



District 


Number 
of Teachers 


Model Procedures 


Observe Procedures 


Number 
of 

Phone Calls 


Filmstrips 


Practice 
Tests 


Reinforcement 


Filmstrips 


Tests 


Reinforcement 


Nebo 


8 


15 


13 


8 


17 


11 


8 


AO 


Granite 


25 


16 


22 


16 


32 


15 


14 


236 


Cache 


5 


5 


5 


5 


5 






25 



\ 



As indicated earlier in Table 20, supervision of teacher implementation 
by project staff began soon after teachers were trained to use the components. 
In conjunction with the first two filmstrips and the first practice test, 
staff visited all classrooms to model procedures and observe the components 
being used. The average number of visits made per teacher was 4,1, After the 
initial visits, follow-up observations were made to those teachers who needed 
more assistance. 

Teachers who were judged to be implementing" the program correctly were 
phoned periodically to discuss progress. As indicated in Table 20, phone 
visits were conducted from December 1, 1981 to April 23, 1982. An average of 
7,9 phone visjts per teacher were made. 

From January 18 to February 1, staff members conducted small group 
meetings with teachers by school. During this time, teachers were asked to 
express their positive and negative feelings toward the project. Ideas for 
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more efficient implementation and management were shared. Project staff 
suggested methods for smoother operations. Feedback from teachers concerning 
films' rips and practice tests was recorded and recommendations about how the 
project could be improved were noted., 

ue i i ver^ of ma, 1 p ^iect materials (filmstrips and practice tests N ' 
were oeriodical iy hand carried ur mat.cd to teachers (<able 20 indicates the 
delivery dates). Teachers scheduled the filmstrips on different dates from 
the practice tests and within two weeks of receiving' material s. Individual 
student reinforcement boards and classroom reinforcemen^ stands were delivered 
to each teacher in Experimental Group I prior to the first practice test 
administration. 

Absentees and attrition . Students who were absent the day^a practice 
test was administered or a filmstrip shown were given make-ups whenever 
possible. Data were kept by the teacher to shew which students did not 
participate in which activities so that absenteeism could be accounted for in 
the- data analyses. Tables 26 and 27 show the number of students who were 
absent and. present for each filmstrip and practice test. The mean class 
attendance for filmstrips and for practice tests was 25. The mean class 
absenteei sm was .9 student per filmstrip and 1.4 students per practice test. 

Evaluation of project implementation . Data used to evaluate the project 
implementation came from three sources: teacher" judgments about individual 
filmstrips, practice tests,, and reinforcement procedures; teacher judgments 
about the project as a whole; and staff judgment on the quality of individual 
teacher implementation. 

Fi Imstrips . Evaluation data on the filmstrips were collected on 
filmstrip evaluation forms (sea Appendix F) which the teacher filled in 
and mailed to USU immediately after showing each filmstrip. No 
evaluations were conducted on Filmstrips #1 and #2 because staff were in 
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Tab 1 e 26 

Number of Classrooms, Students Present, and Students Absent 
'for Each Filistrip 
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totals 




21 


531 


7 
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524 


14 
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519 


13 
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504 
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18 
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507 


26 


21 
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417 
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17 


419 
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408 


21 


17 


409 


25 


17 


423 


14 


16 


404 


11 


14 


348 


15 


17 


25 
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•Both 
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27 


38 


25 


.9 



Average number of students present per class per filmstrip, • 
^Average number of students absent per class per filmstrip. 
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Table 27 



Number of Classrooms, Students Present, and Students Absent-, 
for Each Practice Test 



District 
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Group 
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jjAverage" number of students present per class per filmstrip. 
°Average number of students absent per class per filmstrip. 
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the classrooms and brought observational reports back to direct revisions 

and future filmstrips. ' / 

1 i 
Results from tjie evaluations of Filmstrips #3 through -9 are shown in 

Table 28. The first section of the form asl^ed teachers to Use a 4-point 

scale to agree (1) or disagree (4) with positive comments about the 

filmstrips. The mean response to. 11 statements was 1.8 (between agree, 

2, *and strongly agree, 1). The most positive teacher reaction was 

/ * • 

received toward Teacher Involvement. Teachers felt their involvement was 

clearly defined, easy to accommodate, and appropriate. Although no 

statement received negative feedback (disagree, 3, or strongly disagree, 

4), teachers felt less positive (X = 2.28) about the filmstrip length 

than any other item. Some teachers did feel the filmstrips were too 

lonq because they took time from other work, but the teachers aqreed that 

Students were not bored and enjoyed- watching the f i imstr ips 

The second section of the filmstrip evaluation asked short-answer 
questions. Results in Table 28 showed that teachers perceived a transfer 
of student test-taklrui skills to other subjects, spent minimum 
preparation time (11.9 minutes per filmstrip), thought that students 
learned most' of the concepts (84%), tauqht the fi lmstripk themselves , and 
used supplemental material 38% of the tinje. < 

Additional comments . sol icited from teachers indicated that the red 
"highlight" was not^an" effective method to emphasize words, filmstrips 
were too long, more student practice was needed, and "elimination " 
skilly were not taught thoroughly. 

Practice tests . Feedback on practice tests was collected from 
teachers by phone and through written comments placedon an 
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Table 

SUMMARY OF FILMSTR IP 



28 

EVALUATIONS* 



FILM- 
STRIP 
#3 



TTImT 
STRIP 
14 



STRIP 

£ 



STRIP 
16 



TTUT 
STRIP 
#7 



STRIP 
16 



TTL1T 
STRIP 
19 



Average Rating on Scale of 
l»stronglJ agree to 4»strong1y disagree 



AVERAGE 
FOR ALL 
FILM- 
STRIPS 



1 FILMSTR IP EVALUATION QUESTIONS 



I 



1.97 



1.82 



1.S9 



1.S0 



2.00 



2.09 



1.82 



1.73 



1.87 



1.73 



2.44 



1.47 



1.91 



1.56 



1.70 



1.27 



1.62 



1.56 



1.97 



1.84 2.03 



1,84 I 1.76 



2.31 



1.94 



1.81 



2.03 



1,78 



1.78 



1.56 



1.75 



1.66. 



1.75 



1.81 



1.83 



2.54 



1.77 



1.74 



i:9i 



1.91 



1.85 



1.57 



1.71 



1.91 



2.00 



1.94 



1.89 



2.25 



1.97 



1.53 



2.00 



2.08 



1.79 



1.44 



1.66 



1.61 



2.05 



2.11 



1.86 



2.06 



1.73 



1.73 



1.97 



1.70 



1.73 



1.43 



1.63 



1.52 



1.76 



1.79 



1.73 



2.38 



2.28_ 



1.70 



1.83 



1.58 



1.63 



1.70 



1.90 



1.58 



1.80 



1.66 



1.80 



1.45 



1.50 



1.64 
1.55 



1.68 



1.67 



1.77 



1.86 



1.87 



1.91 



1.7^ 



1.80 



Questions Answered Yes/No or In Minutes 



59* 


74* 


61* 


79* 


84* 


69* 


90* 


76* 


25.15 
(24.29) 


12.73" 

(9.81) 


8.75 
(4.21) 


9.41 
(5.61) 


,9.55 
(6.60) 


B.38 
(4.75) 


9.27 
(6.17) 


i i . o9 
(8.78) 


09* 


24* 


13* 


10* 


33* 


IS* 


11* 


16* 


94* 


/ 94* 


loot 


loot 


94* 


96* 


93* 


96* 


4 2* 


32* 


37* 


48* 


42* 


28* 


39* 


38* 



| FHmstr,1p 

1. The length was appropriate. \ 

2. The Story Hne was entertaining to the students. 

3. The content addressed skills the students need to learn. 

4. The /figures and printing on the\f1tmstr1p were clear. 

5. The, dialogue was audible. 

6. The Mlmstrlp turner was able to We with the narrated 



page. 



Teacher Involvement 

7. The teacher was properly cued to stop the tape. 

\ 

8. The amount of ! Owl/teacher Interaction was appropriate. 

9. The tasks required of the teacher w^re easy to 
Accomplish and defined clearly. \ 

Student Materials ; , 

10. The student practice was sufficient \or students to 
apply the concepts they learned through the fllmstrlp. 

11. The practice exercises were of the appropriate 
difficulty level. i 

TOTAL (AVERAGE FOR FIRST ELEVEN QUESTIONS) 



Have students Applied test-taking sk11T ( s to other 

subjects? (Percentage answering "yes"). 

How long did 1t take to Prepare to teach this fllmstrlp? 

[Average and (standard deviation) In minutes] 

Were there any concepts presented In the fllmstrlp that 

were not learned by your students7 (Percentage 

answering "yes") i 

Were you the teacher for the f11mstr1p7 

(Percentage answering "yes") 

Old you use the pictures that accompany [the f11mstr1p7 
(Percentage answering "yes") 



OPEN-EN0E0 COMMENTS IN RESPONSE TO 
"16: If you have any Additional comments, please write them on the back of this form." 

(Only comments made by 5* or nore of the teachers for a given fllmstrlp are recorded.) . j 

nimstr1p_ #3 '■ Fllnstrlp 17 j 

None More practice needed. i 

Too long. 

Red highlighting doesn't show up well.; 

Fllmstrlp #4 Concept of "eliminate" difficult for j 

children to learn. 

Boring for children. j 



Red highlighting doesn't show up well. 
Too long. 



29* 
15* 



13* 
16* 
8* 

22* 
8* 



Fllmstrlp 15 



Red highlighting doesn't show up well. 
Too long. 



Fllmstrlp 16 



No #4. 

Too long. '? 
Teacher needs helper. 

More examples needed for Identifying sounds. 



16* 
16* 



23* 
25* 
8* 
8* 



Fllmstrlp #8 
Ncne 

Fllmstrlp 9 



Too long. 

^rror at 2nd stop: "Sob" should be "Ti 
Red highlighting doesn't show up well. 



' Mote : Fllmstrlps #1 and 12 were not evaluated using this form. 



ISO 



13* 
13* 
10* 
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identification sheet accompanying returned test forms. Usually, feedback 
was specific to the district's format or the content of the three levels. 
In general, teachers felt that the content of the lowest level of all 
three test formats (SAT, CTBS, and ITBS) and the format of the ITBS was 
too difficult. Since the intent of administering practice tests was to 
give the students exposure to all facets of the reading subtest, 
modifications in the practice test format were not made. However, in 
response to the feedback, the content of the lowest levels was made 
easier so that low-achieving students could experience more success 
before taking the district test, which would probably be very difficult 
for .them. Teachers also noted that later tests were too long (20-30 
minutes). Because one objective was to prepare students to take typical 
standardized tests (which are often 30 minutes long per subtest), the . 
length was not adjusted. 

Reinforcement procedures . After the first reinforcement session, 
informal comments from some teachers suggested that procedures were 
difficult for the teachers to explain and for the students to understand. 
Project staff visited those teachers (see Table 20) to model for teachers 
while reteaching the process to students. During subsequent sessions, 
teachers reported that students sometimes became upset when they did not 
earn points. Teachers were told to encourage students to work harder on 
the next test'. Since the number of points awarded to students was a 
function of the previous test score, students rarely missed .getting 
points on consecutive tests. On the occasion that points were not earned 
on successive tests, teachers were told to lower the "TO BEAT" score 
enough for the student. to earn a point. All modifications to the 
original plan were made to increase reinforcing effects of the points and . 
in no way jeopardized the research design nor outcome data. 
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Project evaluation . After the final student data were collected, 
teachers fn Experimental I and II groups were mailed a Project Evaluation 
Form (see Appendix F). Teachers responded to 39 statements using a 
5-point Likert scale to indicate agreement (1) to disagreement (5). 
Statements concerned filmstrips, practice tests, contact and 
communication with project staff, data collection procedures, general 
impressions, reinforcement .procedures (Experimental Group I only), and 
the spring teacher training workshop (Experimental Group I only). The 
results of the project evaluation are presented in Table 29 by 
experimental group. Teachers in both groups had similar attitudes with a 
mean agreement score of 2.1. Teachers felt more positive toward 
filmstrips (1.6) than the practice tests (1.9) or the reinforcement 
procedures (2.9). 

Before returning the completed form to USU, each teacher was 
contacted by phone by project staff. Teachers were asked to add verbal 
comments to explain their responses to statements in the five areas 
(seven for Experimental Group. I) listed above. These comments are 
presented in Table 30. In general, verbal comments indicated positive 
attitudes toward the filmstrips, practice tests, and the project as' a 
t whole, and negative attitudes toward the reinforcement procedures and the 
filmstrip length. The teachers made several suggestions for project 
improvement. The most frequent suggestions were to provide more student 
practice on filmstrip concepts., increase the percentage of total 
filmstrip time spent on reading comprehension, and include skills for 
math tests in the instructional sequence. 

Support and quality of teachers . The degree of project 

f 

implementation very likely depended to some" degree on the support that 
teachers showed for the project and the quality with which they 

- 182 J, • 



Table 29 



RESULTS FROM TEACHER EVALUATION: PROJECT COMPONENTS 

MEAN ATTITUDE SCORE AND STANDARD DEVIATION PERCENT OF TOTAL RESPONDENTS 

... t . Slr o«9ly Strongly 

H ""Strips Agree Neutral Disagree 



El 


E2 


Total 


Mean 


S.D. 


Mean 


S.O. 


Mean 


S.D. 


1.71 


.56 


1.82 


.64 


1.76 


.59 






1 7(1 


.69 


1.00 


.81 


1.28 


.56 


1.23 


.56 


1.26 


.55 


1.65 


.91 


1.75 


.58 


1.81 


.73 


1 ")C 


.53 


1.64 


.'0 


1.71 


.61 


1.71 


.88 


1.66 


.98 


1.69 


.88 


1.76 

' 


,94 


1.64 


.93 


1.71' 


.93 


1.68 


.48 


1.60 


.5 


1.63 


.49 


1.80 


.68 


1.58 


.62 


1.71 


.65 


2.00 


.10 


1.76 


.83 


1.89 


.98 


2.47 


1.20 


2.17 


.81 


2.34 


1.C4 


2.05 


.82 


2.00 


.79 


2.02^ 


.80 


2.04 


1.07 


1.81 


1.04 


1.94 


1.05 


2.61 


1-07 


2.43 


1,09 


2.54 


1.07 


2.19 


1.07 


1.82 


.88 


2.02 


.99 


2.20 


,79 


2.00 


1,00 




.90 



1. Instructions for teachers were complete and 
easy to follow 

2. The f i Imstrips were easy to implement in the 
classroom 

3. The concepts taught in the fi Imstrips were 
Important for students to learn 

4. The fi Imstrips tauqht the concepts adequately 

5. The students enjoyed the fi Imstrips 

6. I plan to use the filmstrips in future classes 
J. The filmstrips were worth the time and effort 

required 



Practice Tests 



easy to follow 
9. Tests were easy to implement in the classroom 
1C* The test items were appropriate in content and 

and difficulty 
11. The tests adequately prepared the students for 

standardized testing 



13. Students enjoyed taking the practice tests 

14. The practice tests were worth the time and 
effort required 

.79 2.00 1.00 .90 Total Practice Test Component: Items 8-14 



1 


2 


3 


4 


5 


31.6 


60.5 


7.9 


0.0 


0.9 


47.4 


42.1 


5.3 


5.3.. 


0.0 


78.9 


15.8 


5.3 


0.0 


0.0 


34.2 


Sc. 6 


5.3 


5.3 


0.0 


36.8 


55.3 


7.9 


0.0 


0.0 


50.0 


28.9 


10.5 


5.3 


0.0 


52.6 


31.6 


7.9 


7.9 


0.0 



39.5 


50.0 


10.5 


0.0 


0.0 


39.5 


42.1 


10.5 


5.3 


2.6 


15.8 


55.3 


13.2 


10.5 


5.3 


21.1 


57.9 


15.8 


0.0 


2.6 


39.5 


36.8 


10.5 


7.9 


2.6 


13.2 


42.1 


23.7 


13.2 


5.3 


50. 0 


34.2 


15.8 


0.0 


0.0 



Table 29 (cont'd) 
Rftsults from Teacher Evaluation: Project Components 



MEAN AND STANDARD DEVlATlOH 



PERCENT OF TOTAL RESPONDENTS 



Contact and Communication 



Strongly 
Aqree 



Neutral 



Strongly 
Disagree 



El 



E2 



Total 



Mean 


S.O. 


Mean 


S.D. 


Mean 


S.D. . 




1 


2 


3 


4 


5 


1.68 


.68 


K47 


.78 


1.65 


.74 


15. The USU contact person k-.pt re well Informed 


50.0 


34.2 


15.8 


0.0 


0.0 


7.57 


'.59 


2.18 


,75 


1.83 


.73 


16 1 was able to reach mv USU contact oerson 


34.2 


44 7 


18.4 


0.0 


0.0 














and felt comfortable in doing so, 












1.57 


.81 


2.05 


.83 


1.78 


.84 


17. My needs were responded to fn a reasonable 


44.7 


34.2 


18.4 


2.6 


0.0 














amount of time 












1.14 


.39 


1.52 


.62 


1.31 


.52 


18> The contact person listened and responded 


71.1 


26.3 


2.6 


0.0 


0.0 














to mv feedback 












1.40 


.51 


1.90 


.67 


1.60 


.62 


Total Contact'Tnd^ConfrFnTcation Component: 
























Items 15-18 
























3 

Data Collection 












2.04 


.97 


1.64 


1.00 


1.86 


.99 


19. The observation during testing was non- 


39.5 


47.4 


2.6 


7.9 


2.6 














disruptive 












2.23 


.89 


1.82 


1.07 


2.05 


.98 


2d. I would not mind having observers again in 


31.6 


42.1 


18.4 • 


5.3 


2.6 














similar project 












2.42 


1.00 


2.05 


.92 


2.26 


1.08 


HI. Students enjoyed responding to the student 


23.7 


42.1 


18.4 


15.8 


0.0 














attitude measures on Friday 












2.2 


.80 


. 1.80 


.98 


2.00 


.81 


Total Data Collection Component: Items 19-21 






• 




• 














- 

General Impressions 












.1.95, 


.81 


2.17 


.86 


2.05 


.84 


22. The requirements for participation in the study 


26.3 


47.4 


21.1 


5.3 


0.0 














w.ere clearly outlined 












2.14 


.91 


1.94 


.83 


2.05 


.87 


23. The benefits were worth the 'Investment of time 


26.3 


50.0 


15.8 


7.9 


0.0 


2.33 


.80 


1.82 


.88 


2.10 


.85 


24. The project was enjoyable for students 


23.7 


50.0 


18.4 


7.9 


0.0 


1.95 


' .92 


1.82 


.72 


1.89 


.83 


25. The project benefited students' test-takinq 


34.2 


47.4 


13.2 


5.3 


0.0 














ability 












2.47 


.93 


2.11 


.70 


2.21 


.84- 


26. The project enhanced students 1 performance 


13.2 


52.6 


23.7 


10.5 


0.0 














in other areas 












2.14 


.91 


2.05 


.66 


.2.10 


.80 


27. The project was realistic in scope 


15.8 


65.8 


13.2 


2.6 


2.6 


1.95 


.62 


1.58 


1.16 




.96 


28. 1 am glad that I participated 


47.7 


42.1 


5.3 


5.3 


2.6 


2.23 


1.13 


2.10 


.79 


2.13 


.99 


29. The fall workshop, adequately prepared me for 


28.9 


39.5 


23.7 


5.3 


2.6 














the tasks expected 












2.09 


1.00 


1.60 


.63 


1.88 


.89 


30. Taking tests was less anxiety-provoking for 


34.2 


44.7 


7.9 


7.9 


0.0 














students because of the project 












2.10 


.78 


1.90 


.48 


2.00 


.67 


Total General Impressions Component; Items 22-30. 













No 

Data 



2.6 



5.3 



ERIC 



D ■ 



184 



Table 29 (cont'd-) . 
Ro3uU3 from Teacher Evaluation; Project Components 



IEAN AND STANDARP DEVIATION 



PERCENT OF TOTAL RESPONDENTS 



Si 

Mean S.D. 

3.14 1.63 

2.71 1.19 

2.85 1.09 

2.71 U9 

2.90 1.41 



Reinforcement 



Strongly 
Agree 



31. The reinforcement procedures were easy for 
students to understand 

32. The reinforcement procedures were easy for 
the teacher to implement 

33. Students worked hard to earn more than their 
H to beat" score on the test 

34. Students enjoyed the reinforcement procedures 

35. I plan to use the procedures for reinforcement 
in the future 



Neutral 



Strongly 
Disagree 



1 


2 


3 


4 


5 


14.3 


.19 


14.3 


43.8 


9.5 


14.3 


33.3 


28.5 


14.3 


9.5 


9.5 


28.6 


28.6 


23.8 


4.7 


14.3 
24. 0 


38.0 
14.3 


14.3 
24.0 


28,6 
24.0 


4.7 
14.3 



2.86 .18. 



Total Reinforcement Component: Items 31-35 



Spring Workshop 

2.8 1.00 36. Workshop materials were clear and helpful 

2.24 1,41 37. Workshop was appropriate in length 

2-28 1.42 38. Information qained from the workshop(s) was 

worth the amount of time required 
2.15 1,46 39. As a result of the workshop, I was a better 

test administrator 

2.10 1.30 ' Total Sprfng Workshop Component: Items 36-39 



48.0 


24.0 


19.0 


0.0 


9.5 


38.0 


33.3 


9.5 


4.8 


t 14.3 


38.0 


28.6 


14.3 


4.8 


14.3 


48.0 


19.0 


4.8 


14.3 


9.5 



) 
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Table 30 

Verbal Comments from Teachers on Project 



eacher 
ID 

01 

02 
03 

04 
05 

' 06 



Filmstrips 



07 



08 
09 
10 

n 

12 
13 
14 
15 

16 
17 

IB 
19 
20 
21 
22 
23 
24 

25. 

26 



.>27 
.28 
29 
30 
31 
'32 
33 
34 
35 



All, too long. 
Too long, 

Liked them basically; red color bad; too long but very good at catching 
kids - they loved them, will use again. 
Kids enjoyed, the films; a little bit too long. 
1 liked then; for most part; kids enjoyed; sometimes I couldn't 
understand characters; especially helped to teach elimination. 
Really enjoyed program; we had a long break in beginning after we had 
explained the program; should have been more consistent in- time line; 
our materials would sit in post office for 3 days;" the end was awful - 
too crammed together; I will use materials from beginning,* spaced 
throughout year. 

Part on guessing - deduction was important but was presented too 
quickly, students need more practice; too close together - students 
seemed tired of program near the end; did not show 19. ' 
Sometimes too long, but will use again. 
Great. 

Red ink bad; made good points, 
Enjoyed; no additional consents. 



Content was excellent; red lettering poor; need to divide some in half. 
OK but too long (kids only ] as t 15 n j na )j divide into one topic at a 
time. 

Content excellent; need to present one concept/film (10 ur!n./day). 
Hell done, tiut red letters bad, OK for fast kids, too long for slow 
'kids. 

Very enjoyable except II - kids thought it was boring (try color). 
Hell done, some long. 

OK, went welT- see evaluation forms. 

Pretty good; too long; spaceman great. ' 

At first it was. shaky; once in a while teacher would not know when to 
stop - but better at the end with beeps; very clever, 
Kids enjoyed the characters; I would change red; it was good that kids 
could react to characters.. 

Easier if teacher could work program by herself - hard to get; students 
really enjoyed animals throughout; I still used board and red. showed up 
OK, 

Kids really liked them; red was problem, 



Excellent; red bad color. ■ . 

Better once time to respond more accurate; kids enjoyed. / 
Red bad color; occasional muffled sound; content good, kids understood. 



Teacher 
ID 



Practice Tests 



01 

02 
03 

04 



05 



06 



07 



09 
10 
11 
12 
13 
U 
15 
16 
17 

18 
19 
20 
21 
22 
23 
24 

25 

26 

27 



Good except my kids would, have benefited from more practice - less 
film, F 
Pretty good. 

At first too easy, then too hard; worthwhile; got kids used to new 
format. 

Too long; kids were tired of tests; tests were too hard but after taking 
the ITB5 - I understand why it was so hard - but maybe it would be 
better to make it hard for real test and easy all year - during 
practice; just some minor problems with items. 
They got too hard too fast; I understand why it was hard - especially 
when I saw the real ITBS; tests came too fast at end and kids got tired 
of them; too long at end, 

Sometimes directions were typed wrong; once I went through one set - [ 
knew how to give all; students enjoyed them until the end; very pleased 
with program; ITBS practice test didn't do anything - to prepare kids - 
elementary. 

Too close - space but over year - should be viewed as part of 

curriculum; plan to use all practice tests next year - she will have 

some students in third grade, 

No problem. . 

Some better thjn others, a bit of a pain but did prepare kids. 

Here needed, \ \ 

Covered all the things kids needed, 



Format good, easy/to administer, provided good practice* 

Too hard, one child quit. Need to' 1 be easy to build confidence/ " 4 

Need greater differentiation, not easy enough for the slow kids. 

Too many, kids tired by end of project. Spent too much practice on easy 

concepts and not enough on hard concepts (reading comprehension). 

Kids did very well and enjoyed them, 

Some of the sounds were difficult, 

Fine. 

No problem? 
Fine. 

Fine, good for kid. 
Kids enjoyed checking tests 
really a reinforcement. 
Didn't like the. mistakes - I felt 1 had to proof-read each test; kids 
did like to take tests, they always did well, 
Sometimes too difficult for low group; pictures were hard to 
discriminate. 

Medium and high were right; your tests were way too hard and it 
frustrated them; most of low students did not read actual ITBS items - 
they finished very " quickly; a little long;, kids were a lot more relaxed 
for ITBS • they knew exactly what to do for reading; during math, kids- 
seemed confused and more nervous. 



they thought they were smart; checking was 



Very adequate. 

Enjoyed; weren't too hard, level appropriate, 
.Great. A bit hard for Oistar kids. 
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Teacher 
ID 



Contact with Project Staff 



Teacher 
ID 



General Impressions of the Project 



01 Fine., 

02 Once 'we fl.. In. contact, things were faster. 

03 Really gwd, 

04 . ■:■ 

05 - 

06 Avenge ■ few mfxups on materials not arriving; would like to hay had 
more contact. 

0? Ve got behind ana felt pressured, 

03 Great, 

09 Fine, 

10 The staff did all they could. 

11 Good. 

12 . 
1) - 

14 Adequate (team not familiar with the classroom - "ivoiy tower 
• syndrome"). 

15 Good. 

16 Very good. 
1/ Good. 

13 It has been delightful (good PR)v 

19 Great, very patient, 

20 O.K. 

21 Good. 

22 Fine. 

M At first hard to catch the staff, but OK. 

33 Good. 

34 Instructions clear, never any questions. 

35 Very adequate, 




01 
02 

0) 
04 

05 
06 



09 
09 

10 
11 
12 
13 
14 
15 
16 



25 
21 
28 
29 
30 
31 
32 



Data Collection 



Great, 

They started laughing at one of the kids and ha to it-other than 
that, 0.K, 
Really good. 

TH test was discouraging after 4 days of testing; | wasn't bothered at 
all by observers*, 
Didn't bother us. 

I gave the wrong test; students did not notice observers; students were 
tired of testing by Friday. 

Students did not notice the observers; students enjoyed test-wlseness- 

students liked test administrator. 

Observers talked and distracted kids. 

Caused some disruption but no big deal. Would still prefer no 

observers. 

Not even aware of then. 
Fine. 



Fine, they were very quiet and didn't bother children, 
Test-wiseness and attitude was too rushed on Friday (not enough time). 
No problem, except one girl sitting dose could hear tape (watchnd 
observers). 

With proper Introduction, it went well', but by Friday (test-vlseness) 
kids were too tired. 

Sat In the front of the room and were somewhat disruptive. Children 
were very aware they were there, 
•Very quiet. 
Fine. 

No problem • the letter panicked everyone but was fine. 
Good. 

Fine - were worried, but fine, 

Very good observers; did not notice their presence- 

I didn't know they were there; observers weren't trained on test 

administration; kids were a little confused because observer used 

different language. 

■Absolutely no disruption ■ very prepared. 



Nice peopls. 

No, problem; felt It would be better *lth professional observers to give 
test-wlseness test. 
'totally unobtrusive: 



20 
21 
22 

23 
24 
25 
26 



28 
29 
30 
31 
32 
33 
34 



Definitely worthwhile. 
Glad she did it, , 

Liked program; really need to teach this; kids had good feeling. 
Spacod more throughout year; once a month for each activity; took too 
much time in a short span of time; I will do it again - but space It 
throughout the year, 

Project did help with ITQS ■ they learned format; kids did better on 
reading Thin math and spelling; after I TBS kids realized why practice 
tests were so hard; project took a lot of time at end ■ that was hard, 
Really Impressed with program; concepts In later fllmstrlps needed more 
development and practice and T's didn't have time to supplement ■ and 
students really need to know about guessing; lf.5, in fall - I didn't 
really know what I was going to do when I got back lo my class • your 
visit was Important. 

Still confused -\bout what we were to do; wasn't clear on what to bring 
to workshop II. 
1 Enjoyed, worthwhile, kids more relaxed, 
Kids did feel more comfortable but maybe too much! Generally 
worthwhile, would use films again but fewer practice tests. 
Can't answer until data- In, Would rather it to be 1 unit, not so 
dragged out. 

Ihey seemed prepared for reading but social studies and science threw 
them. Kids disappointed it wasn't exactly-llke Owl, 

Kids were very relaxed this year, easiest administration of.SAT In 10 
years of teaching, entire* project easy to plan, prepare and 'administer 
for teacher. Heed to use 20*30 minute blocks of time rather than 45-60 
minute blocks, because you lose children after 20-30 minutes. 
Need to work with teachers In planning the study, After "you" people 
have been out of the classroom for two years, you're "no good". "It is 
like England trying to rule the colonies;" Sometimes toward the end 
there was a lack of consideration for the students, 
Although it was hectic at the end, the training really showed during the 
test, Has lest administration of SAT I've had in 8 years. Really helped 
our English as second languaqe kids who otherwise would have been wiped 
out by this experience. "Wished we would have had training In math," 
Have 2nd grade classroom teachers on planning staff and consult with 
teachers as you go. Plan the same type of instruction for math. 
The test-taking skills have generalized and concepts such as learning to 
.eliminate have carried over, Kids seem better prepared to cope. 
Excellent preparation for the test (kids really learned to proofread and 
use these skills with other assignments), 
later films best; worthwhile, 
hthlng additional. 

Need to see results ■ If they did better It was worthwhile • were some 
skills thsy learned for other things, 
Some films too long: pushing too much. 

A whole outline to show us what to do ■ title page or table of contents; 
surprised how relaxed kids were for IT6S reading • kids were 
apprehensive atout math ■ didn't know how to do' Items; fall workshop 
didn't really explain the scope of the program. 'I knew what was 
expected in the classroom buj didn't know how far to extend Into 
classwork; project was enjoyable • only burden was getting behind. 
Project was worthwhile - kids learned how to take tests; I will use it 
again. 



Children and I looked forward to It, 
Later fllmstrlps super valuable esP, "Eliminate" concept. Carried 
thmuqh to other areas, Still thinks low students stay low. 
Great benefits, kids more comfortable. 
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Table 30 (continued) 



Teacher - ; 
ID 



Reinforcement (Exp, I Only) 



Teacher 
ID 



Spring Workshop (Exp. I Only) 



01 My kids not Impressed, 

02 Don 1 t know for sure how much kids got out of it but excited to take them 
home. Never got hanging board. 

03/ I just couldn't get them to understand they were proud of charts but 
confused. Would use charts but easier to fill in next time. 

Q4 Very negative because it was too time consuming; kids couldn't do it 
without my help; it seemed to be a chore; better for 3rd and 4th grade, 

05 Kids liked it; after you showed them how - they had no trouble; hard for 
slow students who didn't get many points; more reinforcing 'for high 
students. 

06 ■ * ' 

0/ Students didn't ever really understand procedures even at end; hard for 
teachers to explain to students; colorful - students liked chart; / 
discouraging . for low students - too many "no points" - students should get 
at least one point each time. ■ 

03 Good but not great - too many kids and lots of cheating. 

09 Yuk', hating having them circle right answers, 

10 The pits - didn't mean anything to kids, 

11; A hassle, would rather mark charts up and down, not across, Didn't have 
space in room - kids liked coloring. 

14 Didn't care for it, too bulky and cards fell apart, children never spent 
any free time with cards, only when instructed to after teit. ; 

15 Didn't work, wasn't reinforcement (to beat was too high). 

16 Scores not low enough, all excited to take them home, 

17 Not very reinforcing, need to attach points, to extra:recess, etc. 

18 Kids were proud of their charts (took -them home). 

19 Kids liked it. 

20 OK. 

21 1 ' Good except when kids got O's. Liked coloring; — k ~ 



01 Fine. \ 

02 Good, kids loved going back over questions after testing. 

03 Good, excellent. , 

04 Very informative, excellent; without U.S., I would not have taken test 
' or analyzed the test - I know 1 should do it anyway but I wouldn't 

have, ■ i 

05 (lelpful to me; I wasn't so shocked when I gave the test; I could really 
explair to kids that some items were hard and ng one expected students 
to get them all right. 

06 Super, really good; we could have spent 2 days on it. 

07 Host beneficial of all; really prepared T for giving test; ideas 
presented' were not in manual; taking the test taught me what to teach 
kids for taking test, 

OB Fine, learned a lot. 
09 Good, 

Left in middle cuz daughter had .baby - maybe too minute in detail. ' 
■Good, learned a lot of new incorporations. 



Unnecessary (could be done in 30 minutes)* 
"Defeated purpose of your study" (confused the question of student 
improvement by changing teachers - should let teachers* do it normally), 
Was a little long, but learned three valuable lessons: (I) stand or sit 
in front of room rather than roaming around; (2) makes notes and 
observations of students during the test; (3) carefully go over 
directions and EM IN DETAIL. 1 
Waste of time. 

Dldn'f attend. ' 1 N 
—Didn't attend, > 

20 , Interesting. t , 

21 -Clear and useful. L 



16 



17 
18 

49- 



9 

ERLC 



"was** 



tec 
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administered the student training. Staff members who had personal 
contact with the teachers through the project rated each teacher on a 
scale of 1-3 in both quality and support before any ot; ? er project data 
were collected. Guidelines for the criteria for each of the ratings were 
as follows: 

SUPPORT FOR PROGRAM: , 

3 - Seldom compliained, receptive to necessary change, attended 
workshops, eager to cooperate, punctual with materials, very 
positive ovei^ the phone and when observed. 

2 = Occasional ly complained, somewhat resistant to change, partial 

attendance at workshops, cooperated, generally punctual with 
materials, occasionally apathetic but not antagonistic when 
observed. 

1 s Always complaining, very resistant to change, failed to attend 

workshops, little cooperation, general negative attitude over 
the phone and when observed. 

QUALITY OF IMPLEMENTATION: 4 - 

3 = Always on schedule with filmstrips ancJ tests, returned materials 

in proper order, no major deviation^ from implementation (test 
administration, reinforcement, etc.), followed directions when 
observed. — 

2 = Close to schedule with filmstrips and tests, returned almost all 

materials (some with mi stakes) , moderates devi ations in , - 
implementation (changed— !II0 BEAT" scores etc .) , classroom 
observations were fair. 

1 = Seldom on- schedule, missing materials, materi als received had^. 
major errors (did not use reinforcement' charts, etc.), 
.observations poor. 

^Res.uJ.ts,— summar ized in Table 31, indicate that in general, teachers 

) demonstratedi strong support for the project ("X - 2.53) and 

implemented the components in a hi g!; qual ity manner ("X = 2.42). . 
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Table 31 

Mean Ratinqs Given Teachers for 
Support and Quality 





Percent of Teachers- 
Selected for Each 






Mean 


Ratinq 






Ratinq 




District 


Experimental Group 




3 


2 


1 


Cache 


Granite 


Nebo 


El 


E2 - 


Total 


Suppdrt 


57.9 


36.8 


5.3 


3.0 
n = 5 


2.5 

n =" 25 


2.4 

n .= 8 


2.6 
n = 21 


2.5 

n .= 17 


2.5 

n = 38 


Qual ity 


57.9 


26.3 


15.8 


2.8 
n = 5 


2.4 
n = 25 


2.3 
n = 8 


2.5 
n = 21 


2.4 
n = 17 


2.4 

n = 38 
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INSTRUMENTATION 

To determine the effects of the experimental treatment, a variety of 
dependent variables were considered which provided informat ion about both 
students* and teachers' performance during the standardized test 
administration. Data collected included standardized achievement test scores 
from the test used in each district and a variety of locally developed 
measures- which examined such variables as student and teacher on-task behavior 
during the testing, the quality of test administration, student and teacher 
attitude towards testing, and student test-wiseness skills. The -remainder of 
this section provides a brief description of each of these dependent 
vari ables. 

Standardized Achievement Tests n J 

The major objective of this project was to provide an intervention which 
would result in more valid test scores. Consequently, it is logical to 
/examine the standardized achievement test scores of children in the 

experimental and control groups to determine whether' there are differences 
between the scores. If the experimental treatment resulted in" more valid 
scores, then one wou-Tci expect children in the experimental treatment to score 
differently on the average than children in the control groups. Because each 
district included in the study was using a different standardized achievement 
>. test in their district testing program, it was necessary to convert the scores 
to a standard metric before including them /in the analysis. Z score 
transformations (Glass & Stanley, "1970, p. 87) were computed for each 
student's score using the following formula: 

-x.j • SD .i = Z ij' 
Where i equals the it h student and j equals the district (either Granite, 

Nebo, or Cache). Z scores J were computed within each district. In other 
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words, the mean- and standard deviation of all of the participating students' 
scores in Granite were computed and used in conjunction with each individual 
student's score to compute a Z score. Since each district had approximately 
the same number of Experimental Group* I, Experimental 'Group II, and control 
students, this procedure yielded a score which could be combined in one total 
analysis even though districts used different standardized achievement tests. 

Each standardized achievement test was administered by the classroom 
teachers in the individual districts which is the procedure normally followed 
in each of the three districts participating in the project." Granite and Nebo 
"districts administered the tests the week Of March 29th to April 2nd, and 
Cache District administered' the test the week of April 5th to 9th. Al 1 tests 
were scored by the respective publishing companies and returned to the* 
district offices, who then made scores- avai 1 able to the research staff. 
Experimental Group II and Control Group teachers were instructed to follow the 
normal procedures in their district for administering the- test. - 

Students in Cache District completed the most recent version of the 
Comprehensive Tests of Basic Skills , Form U/Level D, Grades 1.6-2.9 (CTBS, 
1981). The battery is made up of 10 subtests in six content are^s (see Table 
32) of which three focus. on reading: word analysis, vocabulary, and reading 
comprehension. This particular version of the CTBS was piloted in 1979 and 
standardization was conducted in the fall of 1980 and spring of 1981. 
Reliability data are available in the test coordinator's handbook. 

Students ir> the Nebo District completed the Iowa Tests of Basic Skills , 
Form 7/Level 8 (ITBS, 1980). The battery is made up of 15 subtests in seven 
skill areas (see Table 32) of which five focus on reading: vocabulary, word 
analysis, picture comprehension, sentence comprehension, and story 
comprehension. According to Buros Mental Measurement Yearbook (8th edition), 




195 



fable 32 



171 



Standardized Test Formats 



TEST SUBTESTS 



Teacher 
Directed 



Student 
Directed 



# Items n Minutes 



CTBS Word Attack 
1981 Vocabulary 

Reading Comprehension s 

Spelling 

Language Mechanics 
Language Expression 
Mathematics Computation 
Mathematics Concepts 
* Science 

Social Studies 



ITBS Listening 
Vocabul ary 
Word Analysis 
^Reading Comprehension 
Pictures 
Sentences 
""Stories 
Language" Skills 
Spelling 
Capital ization 
Punctuation 
Usage 
Work Study Skills 
Visual Material s 
Re f e re n ce *" Ma te r i al s 
Mathematics Skills 

Mathematics Conpepts 
.Mathematics Problems 
Mathematics Computati on 

ft 

Part A 
Reading Part B 
Word Study SkiMls 
Word Study Skills 
Math Concepts 
Math Computations* 
Math Appl ications 
Spelling 
Social Science 
Science 

Listening Comprehension 



X 
X 
X 



X 



X 

© 

X 
X 
X 
X 



® 

X 
X 




40 
25 
25 
25 
20 
25 
20 
30 
25 
25 



32 
20 
57 

23 
16 
28 

29 
75 
68 
23 

32 
38 

36 
24 
28 



37 
45. 
48 
30 
35 
35 
37 
28 
43 
27 
27 
50 



.38 
19' 
28 
17 
15 
27 
18 
33 
28 
28 



16 
14 

20 

12 
7 

15 

13 
12 
13 
9 

24 
25 

15 
18 

22,. 



20 
20 
25 
10 
15 
20 
30 
20 
25 
20 
20 
35 



Circles indicate those subtest scores during which student and teacher on-task 
observational data were collected. 
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the split-half reliability coefficients on composite Scores and equivalent 
forms range from .60 to .94. The intercorrel ations from the subtests range 

o 

from ..69 to .83 with a median of .76: . 

Students in the Granite School District used the Stanford Achievement . , 
Test , Form A/Primary Level 2 (SAT, 1973). The test is made of 12 subtests in 
'eight content areas with five subtests focusing on reading; vocabulary, 
reading part A, reading part B, word study skills A, and word study skills B 
(see Table 32). Standardization of the SAT has keen extensive with split-half 
'^reliability coefficients generally reported in the high .80s to mid .90s. 

Five scores were recorded for each student. ^Because the student training 
focused on the content area of reading, three different reading scores were . 
obtained: a teacher-directed test (i.e.* a test in which each item was read 
by "the teacher and the timing of the test was paced by the teacher), a 
student-directed test (i.e.., a test where the ^teacher gives the directions and 
then gives students a specified time limit to work a number of problems at 
their own pace without furthei directions), and the total reading test. Those 
subtests selected for the teacher-directed and student-directed test in. each 

T if- 

district are circled in Table 32. In addition to these subtest scores, each 
student had a total reading, a total math, and a total test score recorded? " 

The rationale for recording student-directed and- teacher-directed scores 

separately was two-fold. , First, it was felt by the project staff that the 
..skills taught in the filmstrips were very different for teacher-directed and 
tl student-directed tests. Secondly, as will be noted beTow, the on-task 
observations of students and teachers took place during the same 
Student-directed "and^teacher-directed subtests that were included, in this 
analysis. In this way," the-relation between on-task behavior and student 
scores could be observed. , 
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Student and Teacher On-Task Behavior , - 

Student and teacher testing behaviors that are both appropriate and 
thought to produce more valid scores are frequently outlined in test 
administration manuals. These behaviors, particularly those of the teacher, 
are usually specified as "standardized procedures. 11 Adherence to standardized 
procedures is necessary to achieve comparative, normative -data. Unfortunate- 
ly, ..few data show that these preferred behaviors actually do influence the 
validity of test scores. Additionally, even though assumptions are made that 
teachers follow certain (e.g., standardized) testing behaviors, there, is no 
evidence demonstrating t that these behaviors are bei ng di spl ayed. That is, are 
teachers and students really doing what the teacher's manual and other 
documents specify as" "good practice"? Questions such as "What is 'on-task' 
during testing?", "What do students and teachers real ly do during testing?"-, 
and "Do certain student and teacher behaviors affect test results in and of 
themselves?" have not been answered. Two instruments, described below, were 
developed to gather data about these questions. Specifically, the following 
questions were addressed: 
* 1. Do teachers follow the directions prescribed in test manuals to 

x establish appropriate environments prior to, during, and after test 
admin tstr at ion? 

2. Do teachers in different experimental groups implement procedures to 
various degrees depending .on treatment conditions? 

3. Do students attend- to teacher directions and the. test items during 
testing? 

4. Do students in different experimental groups attend to tasks in 
varying degrees depending on the treatment conditions? 
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Two instruments were devised by project staff to collect data that would 
describe classroom behaviors during testing. One measure was a checklist of 
items which were checked off by observers as testing activities were 
initiated. The other measure was an interval recording system for collecting 
on-task behavior of students an<j teachers. Similar versions of both the 
checklist and the observation recording form had been developed under another 
project. The next sections describe the original instruments, the preliminary 
revisions, the pilot tests, and the final revisions. 

Original instruments . Initially, the Quality of Test Administration 
•Checklist consisted of a list of activities which were initiated by the 
teacher and occurred prior to, during, and after test administration. The 
list was generated from test administration manuals, research on classroom 
teaching techniques, and textbooks on psychometrics and test administration. 
Data were collected by pairs of trained observers who checked off items as 
they were observed during group standardized testing. 

The instrumentation to collect on-task data originally included interval 
recording form and extensive definitions of teacher and student on-task 
behavior during testing and teacher contact. Data were collected by pairs of 
trained observers in conjunction with the checklist data. Mean interrater- 
agreement for this version was .88, with a range of .74 to .97. 

Pi lot test . Both the checklist and the behavioral observation systems 
and the observer training were piloted for use with this project using a group 
of 10 graduate students in a research class. The students were trained and 
then they collected data during testing situations in several second grade 
classrooms.. 

Final revisions . As a result of the pilot test, several major changes 
"were made in both instruments. Changes in the checklist included rewording 
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some items to be observable behaviors, adding subjective items that would 
gauge a general negative or positive climate, and writing exaci. directions for 
th? observers. Changes in the behavioral observation system involved writing 
new definitions for "on-task" behavior, distinguishing between teacher- 
directed and timed tests, and notations for when students finished the; timed 
test. A detailed description of each instrument (as it was used in this 
study) is provided below. 

Check! i st . A copy of the final checklist used with th.is project is 
included with Appendix G and test statistics are reported in Table 33. 
The checklist is divided into three sections: teacher behavior before 
administering the test (16 items), teacher behavior during test 
administration (15 items), and questions concerning the classroom 
arrangement and atmosphere (8 items). 

In addition to the 31 items which related directly to the quality of 
test administration as per the teacher's administrat ipn manual, other 
information was collected that was thought to impact on quality of test 
administration such as disruptive occurrences during the test, noticeable 
cheating, seating arrangements, and the teacher 1 s ^use of the aide. 

Typically, observers would check some items during their observation 
and some items after leaving the room. The "checklist was always "used by 
pairs of observers during standardized testing. Interrater agreements 
were computed for each classroom usinq the equation 

Number of Agreements ( 1 ) 

Number of Total Items 

for an overall mean of .91 (SD = .095). 

B ehavioral observation . A review of the literature and previous 

observations contributed to a list of appropriate- student behaviors most 

conducive to producing high levels of attention fo academic tasks. 

• 200 



176 



Table 33; 

Test Statistics on Data Col lection Instruments 
Developed by Project 



Standard % Agreement 
Error of ' Between 
Measure- Ob servers 

ment J SD 



Number of 
Items 



SD 



Reliabi- 
lity 3 



On-Task Behavior 


N/A 


89, 


.8 


11. 


8 


N/A 


3. 


,'66 


90. 


4 6.0 


Qual.ity of Teacher 
Test. Administration 


31 


51. 


1 

. X 


6. 


,3 . 


."82 


2. 


,62 


90. 


,6 9.5 


Teacher Attitude 


30 


87. 


.7 


12. 


• 4 . 


.89 


'3. 


,97 




N/A 


Student Attitude 


8 


12. 


.0 


3. 


.6 


.80 


1. 


,51 




N/A 

f 


Test-wiseness 


38 


21. 


.3 


5. 


.0 


.75 


2. 


.50 - 




N/A 



Reliability estimates computed using Hoyt's measure of internal 
consistency (Hoyt, 1941; Magnusson, 1967, p. 117). 



Definitions of what to consider on- and off-task during testing were 
derived from this review. Hence, students were considered on-task when 
looking at the teacher or their test booklet (during teacher-directed or 
timed subtests) and when sfol lowing directions. Students were off -task 
•when they displayed any other behavior. c . A third category of student, " 
behavior was observed, " probably on -task ", to accommodate those gray - 
areas when observers could not be precise in their "on-task" coding. 
This situation would occur when the students appeared to be following 

a ■ 

a 

directions although they were looking away from teacher or test booklet. 

Standardized testing procedures listed^ in the testing manual's and 
preliminary observations during another project formed the basis for 
defining teacher on-task behavior .- Actions consistent with attending to 
the stents 1 behavior at all times (while directing the test 
administration under standardized conditions) are defined as on-task 
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behaviors. For example, during a timed test a teacher is^ on-task when 
'orally reading directions but is off-task when just talking to the entire 
class. Essentially, teachers were on-task when .reading aloud from the 
manual or watching the students from the front c* the room. Behavior 
definitions for both students and teachers are summarized iji Figure 4. 

.An interval recording form was used to collect the on-task data pn 
both students and teachers (see Appendix G tor a copy of the form and/ 
Table 33 for test statistics). Observers were paired for each observa- 
tion to collect data on five students and one teacher during each obser- 
vation. Student names were not used on the observation form. Instead, 
'observers randomly selected five students in each cl assroom -and noted a 
physical characteristic and the type of each 'student column so* that they 
could move from child to child as quickly as the intervals indicated. 

D.ata recording began when the test administrator started reading 
the directions and Bnded when the subtest was completed. Five-second 
intervals consisted of 3 seconds to observe and 2, seconds to record. 
Observers watched each child for 4 consecutive intervals or 20 seconds (4 
intervals X 5 seconds = 20 seconds) before moving to the next student. 
Data were recorded on five students and one teacher (six subjects) for a 
total of 2 minutes '(6 X 20 seconds) before repeating the cycle. 

Data were placed on the recording form at a signal from'a tape 
recording that indicated when to observe (3 seconds) and when t.o record 
(2 seconds). Portable tape players were equipped with earphones for two 
people to P use simultaneously, facilitating interrater agreement 
calculation. Recording started when the teacher began the directions and 
observers marked each cell for on-task (1),- off-task (0) behavior, or 
"probably on task (-). Each of the five students and the one teacher was. 

observed for 20 consecutive seconds, or four cells, during each 2-minute^ 

*W - fl > . .. .. 

v 
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Figured. Basic Definitions for On-Task Behavior 





- STUDENTS 


1 

Dpfinitplv On T^ct/ 


Drnh 1 w fir* Tarl* 

riOuauxy Un- 1 aSK 


0 

uetimtely Off Task 


Teacher 
Directed 

• 


Fol lowing directions 
given by teacher 
with eyes focused on 

t rhpr nr t* a c +• 

^CaLIICI Ui Lcbt 

booklet 


Could be following 
directions but eyes 
not focused on 
ueacner or test ^ 
booklet while teach- 
er reads directions 
or *after students 
finish item 


Not following 
directions given by 
teacher or 
misbehaving 


Timed 
Test 

directions 


(as above) 


(as above) 


(as above) 


during timinq 


After test starts 
until teacher says 
stop, students must 
be looking at test 
booklet or teacher 




Not looking at • 
teacher or test- 
booklet or 
misbehaving 
Out of seat 
, Talking aloud 





TEACHER 



1 

ON-TASK 


0 

OFF-TASK > 


During directions or teacher 
. directed items, must be in. 
_ front of the room. When' not 

reading directions or items, 

teacher is either looking at 
'students or assisting a 

student. - 


Not in front of the room 
while reading- to students ' 
from manual. Looking at 
something other "than 
students during timed "test. . 



/ 
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block of time. At the end of 2 minutes or once across one row of 
interval cells," a 5-minute pause gave observers a. chance to make notes 
and locate their position with the first 'child again 'before starting the 
next 2-minute observation. Observers computed percent on-task by 
.dividing the number of "1" marks by the total number of intervals.' All 
computations were checked by a second p.erson and errors adjusted. 

Procedures for data collection . Personnel hired to collect data were 
members of Title I Parent Advisory Councils in Cache, Granite, and Nebo school 
districts. A total, of 22 observers were hired at $5.00 per hour including 
training, data collectio^, and travel to schools. 

Data collectors were trained simultaneously to administer both the 
behavioral observation form x and the checklist. Training consisted of threfc. m 
segments: " (a) practice with, videotaped scenes of classroom testing, (b) 
.practice in the classroom during actual testing, and (c) retraining^, An 
outline of the initial training segment conducted from 9:00 to, 3:00 on March 
26, 1983, is included in Appendix G/ The data collectors were kept.nai.ve of 
the experimental design and research questions. Basically, the training 
sessions led them through each component of the* observation ■ and checklist 
procedure. They rehearsed data ...col lection, used- videotaped scenes," and, 
practiced setting up equipment. A., list of observation procedures is located 
in Appendix G. . 1 

The schedule of classroom practice and actual data collection is located 
in Appendix 6. Two subtests were observed in each teacher's classroom: a 
teacher-directed test and a student-directed (timed) test (Table 32* indicates 
the subtests observed in each district*). In all cases, ■ the-teacher-directed 
test was given before the timed test. Observers, were, assigned by pairs to, 
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classrooms. One or two data collectors were designated as substitutes each 
day. Dates and limes for actual classroom observation were randomly assigned 
across experimental groups and districts (see Table 34 for this breakdown). 

Observers^ were assigned in pairs to classrooms to practice data 
collection on the first testing day in each district (March 29 in Granite and 
Nebo and April 5 in Cache). Classroms selected for the practice sessions are 
listed in Appendix G, and their data were not included in the analysis. 
Practice was provided .on two tests: a teacher rdirecteH and a student -directed 
(see Table 32 for the subtests in these categon&sK 

Prior to the practice, observers watched one subtest being administered 
to get a "feel" for the classroom situation and they recorded no data. 
Observers then . recorded behavior on the next two subtests as described above.' 
Data collected during the classroom practice obtained an overall interrater- 
agreement of 86. 8. for on -task behavior observations (84.8 with the 
teacher-directed test and 88.9 with the student -directed test) and 93.9 for 
the Quality of Test Administration. Checklist (see Tables 35 and 36 for a 
breakdown by district). 

A retraining session was held during the afternoon of the practice' data 
collection on Monday. At this time,, definitions were clarified, disagreements 
among observers were solved, and forms were checked by staff personnel for 
completeness and accuracy. 

Actual observations began on Tuesday (March 30 for Granite and Nebo and 
April 6 for Cache), the second test day and continued through Thursday of the 
same week. Observers were randomly assigned by different pairs each day to 
observe both a teacher -directed and student-directed test. These tests were 
administered consecutively and their order was randomly assigned across 
teachers. Each day one observer was not assigned to a classroom and was 
available as a substitute in case an assigned observer did not show up. 
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Table 34 

BREAKDOWN FOR OBSERVATIONS 
BY NUMBER OF CLASSES 



\ 









PRACTICE 


'ACTUAL OBSERVATIONS 


ATTITUDE 


i TEST-WISENESS 


Test/ 




Number of 


Monday 


Tuesday 


Wednesday 


Thursday 






Triday 


District 


Group 


Classes 


n 1 1 

8-11 


8-10 10-12 


8-10 


10-12 


8-10 10-12 


9-10 


10-11 41-12 12-2 


SAT 


El 


14 


2 


2 2 


2 


2 


2 I 


4 


3 


3 4 


GRANITE 
























E2 


11 




2 • 2 


2 


2 


2 1; 


3 


3 


2 3 




r 
I 


8 




2 1 


o 

2 




1 2 


2 


3 


l' 2 




utner 




L 










i 






CTBS 


El 


3 






2 


1 




, l 


1 


1 


CACHE 
























E2 


2 










1 1 ' 


l 


1 






A 

c 


5 




1 1 




1 


1 1 


l 


1 


2 1 ' 




■Other 




3 
















ITBS 


El 


4 




1 1 






1 1 


l 


1 


1 1 ' 


NEBO 
























E2 


4 






1 


1 


1 1 


l 


, 1 


1 1 




C 


7 


2 


L 1 


. 1 


1 




2 


2 


1 .2 


TOTALS 




58 


9 


10 8 


10 


8 


9 . 9 


16 


16 


11 15 



Table 35 
PRACTICE DATA COLLECTION 

Percent of Interrater Agreement for Quality 
of Test Administration 



Pi strict N Percent Agreement SD 

Cache 3 93.0 7.0 

Granite 6 94.5 6.0 

Nebo 2 93.5 4.9 

OVERALL 11 93.9 5.6 



Table 35 
PRACTICE DATA COLLECTION 

Percent of Interrater Agreement for On-Task Behavior 
During Teacher and Student Directed Tests 



Teacher Student 

District Directed Directed Overall 

Cache 82.0 ' 91. °0 86.7 

n = 3 « 14.7 9.0 12.1 



Nebo 75.3 84.5 86.8 

n = 2 8.1 1.1 S.4 

Granite 89.3 89.2 89.2 

n = 6 4. 4 7. 0 5.6 

All Districts 84.8 88.9 86.8 

9.6 6.9 S.4 



Note. Italicized numbers are the standard deviations. 
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Substitutes not needing to fill a vacancy computed mean on-task percentages 
and interrater agreement. Interrater agreements were computed using Equation 
1 at 90.6 for observations and 90.44 for the checklist. These data are 
reported in Tables 37 and 38 by district and summarized in Table 33. 

Dependent measures . For the Quality of Test Administration Checklist, 
the number used in analysis of data is the percent of the 31 items scored as 
"qccurring" in the classroom. For the behavioral observations, percent of 
time was used in the final statistical analysis. Two percentages were comput- 
ed separately for both teacher and student on-task behavior: on-task behavior 
during teacher-directed and during student-directed (timed) tests. Commuta- 
tions of on-task behavior combined percentages of definitely on-task and 
probably-'on-task since interrater agreements were comparable whether these 
scores were separated or combined. 

Locally Developed Instruments 

As noted above, data were also collected about teacher and student 
attitude and student test-wiseness. Because no appropriate instruments could 
be identified in these areas, project staff developed instruments, pilot 
tested them, and revised them as necessary for use in the project. Table 33 
contains some descriptive information including number of items on the three 
measures, mean and standard deviation, reliability, and standard error of 
measurement for each instrument. This infoTmaTHW" will be helpful in 
interpreting the results of these tests in the Results section. A brief 
summary of the content and development procedures for each instrument is 
described below. 

Teacher attitude towards standardized tests . It is not uncorppipn, for,- .. . 
classroom teachers to feel fairly negative about, standardized achievement i c 
tests. Although standardized achievement tests can cause mapy. problems, it; < iar , 
was our conviction that ..properly administered and 2 n J^rpreted, ... ^stand^ardi^ed, : . J.k 

• 2Qg BEST COPY AVAILABLE 



Table 37 



Actual Data Collection 
Percent of Interrater Agreement tor Quality 
of Test Administration 



District _N Percent Agreement SD 

C*-.chc 10 97.70 3.56 

Granite 31 88.96 11.16 

Nebo 13 88.38 10.86 

OVERALL 54 90.44 9.51 

Table 38 \ 

o ACTUAL DATA COLLECTION 



Percent of Interrater Agreement for On-Task Behavior." 
During Teacher and Student Directed Tests 



i 





District 


TUESDAY 


WEDNESDAY 


FRIDAY 


DISTRICT 




8:00 


10:00 


8:00 


10:00 


8:00 


10:00 


. MEAN 




Cache 
n = 10 


94.5 
1 


95.0 

1, 


r0 .5 
S 


"T.O 


90.5 
2 


90.0 

1 ' 


90.8 
4 


DIRECTED 


Granite 
n =-30 


84.5 
7 


86.6 
7 


91.7 

4 


92.3 

A 


89.6 

6 


88.4 

7 


88.6 

6 


TEACHER 


Nebo 
n - 13 


85.9 
5 


86.5 
7 


86.1 
4 


82.3 

6 


87.5 
9 


87.9 
8 


86.0 
5 




ALL 

n = 53 


86.7 
7 


87.7 
7 


89.9 
5 


88.2 

6 


89.3 
5 


r 88.4 
6 


88.7 

13- 




Cache 
n = 10 


92.5 
1 


99.0 
1 


5 

4 


95.5 
4 


95.5 
1 


96.0 
1 


94.5 
3 


DIRECTED 


Granite 
n = 31 


91.2 
3 


95.2 
4 


96.3 
2 


95.3 
5 


90.8 

7 ■ 


94.2 
4 


93.8 
4 


STUDENT 


Nebo 
n = 13 


90.5 
3 


94.3 
4 


92.2 
4 


94.8 ' 

2 


84.5 
11 


78.2 
27 


89.2 
10 




ALL 

n = 54 


91.2 
3 


95.4 
4 


94.5 

" 3 


95.1 
3 


90.7 
6 


91.1 

13 


92.8 
6 




OVERALL 
n = 107 


88.9^. 
6 


91.5 - 
6 


92.2 
5 


91.9 
6 


89.8 
6 


89.4 

11 10 


90.6 
6 





















Note. Italicized numbers are the standard deviations. 
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achievement tests can provide a valuable tool for the educational process. 
Furthermore, we, hypothesized that teachers' attitude toward standardized 
testing would change as they understood more the purpose of standardized 
achievement testing, felt that students ha<j been adequately* prepared to take 
the test, and became more skilled in administering the test. An extensive 
search for a teacher attitude towards standardized achievement tests yielded^ 
only one instrument that was reasonably close to what we needed in. the project 
(Beck & Stetz, 1979; Stetz & Beck, 1979). This instrument was used as a basis 
to refine and develop an instrument in which teachers were asked to respond to 
Likert-type items in five categories: general opinion, attitude toward 
administering tests, usefulness of tests, students' feelings about tests, and 
whether tests should be used more -frequently. The total scale consisted of 35 
- items. .The prototype of theMnstrument was critiqued, by project staff members 
and other testing experts at Utah State University and then, pjlot tested on an 
individual basis with four second grade Logan District teachers. Following 
this pilot test, revisions were made; some items were added and directions 
were classified; and the test was administered to two classes of 35 teachers 
who were attending an in-service training program sponsored by Utah State 
University. Each of these teachers were currently teaching in Granite School 
District, although none were participating in the test-taking skills project. 
Item analyses were computed for each of the classes, and point biserials and 
difficulty levels were used to further refine and improve the test. As noted 
in Table 33, the final rel iability coeff icient estimate was .89 and scores - 
were reasonably, wel 1 distributed. The actual instrument used to collect data 
on teacher attitude is included in Appendix G along with item statistics from 
the three groups participating in the study (the format of the questionnaire 

9 c 

as it appears in Appendix G has been changed slightly to accommodate the 
display of item statistics). ■ ? 
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This measure of teacher attitude towards testing was delivered to 
teachers by trained observers who also administered the student attitude form 
described below. The test was administered on Friday of the week in which the 
district did standardized achievement testing and then picked up by the 
observer (see Table 34 for schedule). Each teacher filled out the question- 
naire independently (requiring approximately 10 to 15 minutes) during the, time 
the student test-wi senes.s and attitude measures were being.-administered. No 
problems were noted by the observers in collecting the data, and all teachers 
completed the questionnaire. 

Student attitude towards standardized tests , A second major objective of 
the project was to reduce the anxiety that many students feel during 
standardized achievement testing. and make standardized achievement testing a 
less threatening- and more comfortable experience; No measures for assessing 
second grade students' attitude towards standardized achievement testing could 
be- located. Therefore, the project developed a measure which was administered 
by the same people who collected the on-task data during the testing period. 
These data were collected on Friday.of 'the week in which .the standardized 
achievement testing was done so that the testing experience was still fresh in 
students' minds (see schedule in Table 34). The actual instrument used is 
included in Appendix G. The instrument consisted of nine three-point semantic 
differential type items regarding standardized achievement testing. 
Directions for administering the test are also included in Appendtx G. The 
person administering the test talked students through each item using a direct 
-instruction mode (defining objectives, giving examples, leading t the students 
through examples, testing them to make sure that they understand, and then 
proceeding to the test)'. None of the people administering the test knew which 
classes were in which experimental group. Appendix G also contains item 
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statistics for each of the items in the test. The reliability estimate of .80 
for an eight-item test is quite high and scores were distributed fairly well 



as shown in Appendix G). 

Test-wiseness . Millman, Bishop, and Ebel (1965) defined t'est-wiseness as 
"a subject's capacity to utilize the characteristics and formats of the test 
and/or the test taking situation to receive a high score. " According to' 
Millman et al., test-wiseness is "logical ly independent of the examinee's 
knowledge of the subject matter." As a part of this project, we 
differentiated between test-wiseness (strategies that allow a student to get , 
the correct answer cn a test even when they have no knowledge of the content 
being tested) and test-taking skills (mastery of skills that allows a student 
to demonstrate knowl edge* that they do have about the content area instead of 

r 

being confused by strange format or anxiety-provoking experiences). This 
instrument combined both test-wiseness and test-taking skills. 

The instrument was d.ivided in three sections. The first part of the test 
focused on test-wiseness skills following the outline proposed by Mijlman, 
•Bishop, and Ebel. Items in the following areas were generated: 

1. Eliminate options which are known to be incorrect and choose from 
among remaining options (deductive reasoning). 

2. Choose neither or both of two options which imply the correctness of 
each other. 

3. Restrict choree to options which encompass all of two or more given 
\ statements known to be correct. 
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6. Consider relevance of specific detail when answering a specific 
> item. 

7. Recognize and use specific determiners (e.g., often, seldom, always, 
never ) . 

8. Recognize and make use of resemblances between the options and an 
aspect of the stem. 

9. When no other information is available, choose the longest 
alternative. 

10. Select option which agrees grammatically with the stem. 
The second area was* related more directly to elimination and guessing 
strategies which are a part of the test-taking skills taught by the project. 
Elimination and guessing are more test-taking rather than test-wi seness skills 
because they only help the student who has some knowledge about the content 
being tested. The final section of the test focused on the student's ability 
to follow'directions which are different from what he or she is used to 
getting. This skill was an important part of what the filmstrips attempted to 
teacher students.. 

Two forms of the "test-wiseness" test were developed. Each of these 
forms was administered to five different c individual students and notes were 
made about where students were having difficulty understanding the test or 
v here the items wer*e not functioning as desired. The two forms of the test 
were then administered to two classes for each form (four classes in total) . 
and results were submitted to an item analysis program which provided 
difficulty level, point biserials, and subtest correlations. In addition, 
student's reading ability was correlated with scores on the test. Using this 
information, a final version of the test was developed using items from both 
versions of the pilot test. The final version used approximately half of the 



189 

items that had originally been developed. Numerous changes in wording, 
distractors, and arrangement of the items was made during this pilot testing. 
The final version of the test consisted of 38 items with a reliability 
estimate of .75 noted in Table 33. The somewhat Jow reliability estimate is 
in a part a Junction of the difficulty level on pa- of' the test. For * 
example, Part C had an average difficulty level of .863. As Hopkins and 
Stanley (1981) point out, -eliability estimates computed via measures of 
internal consistency are very sensitive to extreme high or low difficulty 
levels and are always lower the tarther the difficulty level is from .50. A 
copy of the measure of test-wi seness with the directions for administering the 
.test and selected item statistics for the three groups participating in the 
project are included in Appendix G. 

Accuracy checks on coded data . All data collected with the instruments 
described above were subjected to accuracy checks. First, aJM_ computations 
(including percentages and score sums) were computed twice. Second, data <• 
were transferred from. one form to another or entered from standardized testing, 
reports to make sure the correct number had been entered in the correct column 
for the right person. After all data were entered in the master file, 
frequencies and descriptive statistics (means, standard deviations, minimums 
and maximums) were computed and checked yagainst possible values. 

Summa ry 

Various sources of data were used to examine the effect of -the 
experimental treatment on students and teachers. Standardized achievement 
test scores, teacher and student attitude towards testing, teacher and student 
on-task behavior, students' test-wiseness, ° and the teacher's quality of test 




215 



190 



admini strat<ion were all considered. Data from these instruments were combined 
to form 13 dependent measures which were used in the statistical analyses. 
The intercorrelations between the 13 dependent measures are presented in Table 
39. . - ; 

The best means of determining what was really beinq measured by these 
different instruments, particularly those that were developed locally, is to 
carefully examine the copies of the instruments included in Appendix G. All 
but the standardized achievement tests are included in essentially the same 
format, including directions, in which they were administered during the 
project. Some spacing changes have been made to accommodate the item 
statistics reported in Appendix G, but wordinq and order of items is 
identical. , The reader is encouraged to cons these instruments carefully 
in interpreting the results reported in Chapter IV... 
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Table 39 . 
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CHAPTER IV 
RESULTS AND CONCLUSIONS 

As- described earlier, the research objectives of this project were to: 

1. Determine the effectiveness of training materials developed to teach 
' elementary school students test-taking skills, motivate students to 

do their best on standardized achievement tests, and train teachers 
in standardized test administration skills. , 

2. Determine the relationship between scores on standardized achievement 

\ 

tests . and students' test-taking skills, teachers 1 test administration 

skills, and students' level of motivation. 
To provide information about these two objectives, data were collected from 58 
classes (containing over 1,400 second grade students) Were randomly assigned 
to one of three qroups . Experimental Group I (El) receiving training in 
test-taking skills (f i Imstrips' and practice tests) ; reinforcement procedures, 
and training in standardized test administration. Experimental Group II (E2) 
received only the training in test-taking skills (filmstrips and practice 
tests), and Control Groups (C) received no special curriculum or training 
procedures related to administering or taking standardized achievement tests. 
Data were collected from each group about: 

1. Teachers' perceptions of the value of the training materials and 
procedures. 

2. Teacher and student attitude towards standardized achievement testing 
and behavior during standardized achievement testing. 

1 3. Students' scores on the standardized achievement test. 
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In addition, substantial demographic and implementation data were collected to 
assist in interpreting, the results of the dependent measures described above. 
A listing of the data" col lected during the project is contained in Appendix H 
along with a description of the data file (i.e., variable names, labels, and' 
columns in which data are located). The complete data file is available from 
\he authors. The remainder of this section reports the results of the 
analyses used to answer the two major research objectives outlined above and 
uses those results to draw conclusions about the project. 

Effectiveness of Training Materials 

The degree to which the training materials and procedures were 
effective in teaching students test-taking skills, motivating students to do 
their best on standardized achievement tests, and teaching teachers skills in 
standardized test administration can be judged in terms of teachers 1 
perceptions of the project and the objective data gathered by standardized 
achievement tests, locally developed instruments, and observers who were 
uninformed about the nature of the project. The results of the data 
collection in each of those areas are summarized below. 

Teachers ' Perceptions 

As noted previously in the Implementation Section, most components of the 
project were viewed very positively by teachers. The filmstrips and practice 
tests were particularly well received. For example: 1 * 

• 84.2% of the teachers felt the filmstrips were worth the time and 
effort required. 

• 78.9% of the teachers plan to use the filmstrips next year. 

• 94.7% of the teachers felt the filmstrips taught concepts which were 
important for students to learn. 
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• 79.0% of the teachers felt the practice tests adequately prepared the 
students for standardized testing. 

e 76." 3%. of the teachers plan . to use the practice tests in the future. 

84.2% of the teachers felt the practice tests were worth -the time and 
effort required. 

• 76.3% of the teachers felt the benefits of the total project "were 
worth the investment of time. 

• 73.7% of the teachers felt the project was enjoyable for students. 

o 81.6% of the teachers felt the project benefited students' test-takinq 
skills. 

• 78.9% of the teachers felt taking tests was a less anxiety-provoking 
experience for students as a result of the project. 

Teachers' percept ions 'of the procedures for teaching standardized test 

administration skills were also positive. Seventy-one percent of the . 

participating teachers felt that they were better test administrators as a 

result of the workshops. Typical comments from teachers concerning the 

training in standardized test administration were as follows: 

• "Very informative. Excellent." 

• "Most beneficial of all. Really prepared me for qiving the test. 

Ideas presented were not in the manual; taking the test taught me 
what to teach kids for taking the test." 

e "Super, really good; we could have spent two days on it." \ 

The procedures used to motivate students to try their best on tests were 

viewed less positively.' For example:" 

• 53.3% of the teachers felt that the motivational procedures were 
difficult for students 'to understand. 

• Only 38.3% of the teachers plan to use the motivational procedures in 
the future. 

t Only 38.1% of the teachers felt that the procedures motivated students 
to improve their scores from practice test to practice test. 
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Although there were exceptions, typical teacher comments about the 

reinforcement procedures were, as follows: 

e ,r Don r tTk™^ it."" 

o "Negative, because it was too time consuming; kids could not do it 
without my help." 

• "Hard for slow students who did not get many points; more reinforcing 
for high students." • 



"Good but not great. Too many kids and lots of cheating." 
9 "Not very reinforcing." 
The teachers 1 ratings of the reinforcement procedures in conjunction with 
their comments indicate that the reinforcement procedures were the, weakest 
part of the project. 

' In summary then, these data indicate that, teachers generally felt very 
positive about the components of the project designed to teach test-taking 
skills to children and standardized test administration procedures to x 
teachers. They were not. as positive about the procedures designed to motivate 
students to try their best on tests. Teachers not only liked the project but 
felt! that it was having, a positive impact on students' test-taking skills and 
on tlhe quality of test administration. A strong indicator of teachers' 
perceptions of the project's value is that they plan to .continue using the 
materials in the future when there was no longer any "requirement" from the 
project to do-so. 

The following comments from teachers collected during the project 
debrijefingi underscore teachers '. positive'evaluation of the training materials 
and procedures. 
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"Lijced the program; really needed to teach these concepts; kids had 
good feelings." 
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0 "Although it was hectic at the end, the training really showed during 
the test. This was the easiest administration of the. SAT I've had in 
eight years. Really helped our English As A Second Language children 
who otherwise would have been wiped out by this experience. Wished 
we would have had similar training for math." 

e "Enjoyed. Worthwhile. Kids were more relaxed." 

' * e "Kids were very relaxed this year. Easiest administration of SAT in 
10 years of teaching. Entire project easy to plan, prepare, and 
administer for teacher." 

• "Excellent preparation for the test. Kids really learned to 
proofread and used these skills with other assignments." 

Differences Between Groups on Outcome Variables 

As noted earlier, classes participating in the project were randomly 
assigned to one of three groups: Experimental Group I (El) received all 
components of the project including training students in test-taking skills 
(using filmstrips and practice tests), training teachers in standardized test 
administration procedures, and procedures to motivate students to do their 
best on standardized achievement tests. Participants in Experyjient al Group II 
(E2) received only the student training incest-taking skills (filmstrips and 
practice tests) and did not receive the teacher training in standardized test 
administration or the reinforcement procedures. It should be noted that 
although teachers in E2' were not explicitly trained in standardized test 
administration/ the structured way in which practice tests were administered 
did provide them with frequent practice in administration of standardized 
tests which may have- transferred to some degree to their administration of the 
actual standardized test. Because this training was impVicit rather than 
explicit as it was in El, it was not anticipated that, it would have a very 
powerful effect on teachers' performance during the actual standardized test. 
Participants in the control group did not receive any of the project 
mater i al s . 
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Table 40 summarizes the results for each of the outcome measures 
according to these three groups. Included in Table 40 are means, medians, 
control group' standard deviations, probability levels from one-way analyses of 
variances, and measures of effect size differences between the "groups for all 
participating students. 

Results in Table 0 40 are also reported for three additional subgroups of 
students. First, only those students. who received a majority of the 
treatment and eliminating students who were in special education programs or 
for whom English was not their primary language. As noted in. the footnote to 
Table 40, this subgroup was defined as those students who viewed five or more 
of the f ilmstrips, took three or more of the practice tests, were not in 
special education programs or English As A Second Language programs, and had 
teachers who were not rated at the bottom of the scale on quality of 
implementation or support for the project. The second subgroup of students 
were those students who received all of the treatment. This category was 
defined in the same way -as the preceding one except it was limited to those 
students .who saw all nine filmstrips and took all seven practice tests. 
A third subgroup considered only students in Title I programs who received all 
of the treatment. 

The analyses for these three additional subgroups were performed to 
determine if the program had differential effects for certain types of 
students. As can be seen in. Table 40, the outcomes are very similar for all 
four groups. Therefore, the discussion which follows will refer to the total 
group of students except where noted. It .should also be noted tjiat the 
probability levels derived from the analyses of variance reported in the first 
two columns were computed based on mean differences between groups. As is 
well known, in cases where distributions of scores are skewed, medians rather 
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Table 40 

Scores on Dependent Variables by Experimental Group 



Variable 


Control 
Group 

All Students : P c , SO if 


Students Receiving 
Majority of Treatment 3 P c 


Students Receiving 
All of Treatment ° 


Title I Students 
Receiving. All of . 
Treatment b . 


Teacher Attitude I 
(Total) 

Md 


85./ 87.9 89.1 

C < 2 < 1 '.000 12.3 .35 
85.5 87.2 89.8 


85.9 86.2 88.4 

C < 1 < 2 .014 
85,5 86,7 87.4 


83.7 85.9 87.0 

1 < C < 2 
81.3 85.5 91.6 


*85,3 86.5 T6> 

1 < C- < 2 
85.4 85,3 87.5 

7 


Teacher Attitude .« a 
(Opinion) 

Hd 


U.7 14.9 15.1 

1 < C < 2 .292 3.1 .10 
14.8 14,9 15.1 


13.9 15.0 15.6 ? 

1 < C < 2 .000 
14.3 14.9 15,6 


13.2 15.0 15.3 

1 < C < 2 
13.5 14.9 15.4 


13.3 14.7 15.5 

1 < 2 < C 
14,0 15,0 15.7 


Teacher Attitude X 
(Feeling) 

Hd 


19.5 20,7 20.9 

C < .1 < 2 .000 3.1. .35 

19.6 20,2 20,7 


ny 20.7 2o.i 

C < 1 < 2" .001 
19.7 19.9 20.7 


*19.6 20.6 20.1 

C < 2 < 1 
19.7 19.8.. 19.9 


19.4 20.3 20.7 
C < 1 < 2 

19.5 20,0 20.2 


r, 

Teacher Attitude (Use) 

Hd' 


*24.8 25,9 25,6 

1 < C ( 2 .000 4,2 ,55 
23,3 25.8 25,6 


■*24.3 -26.0 25.6 

1 < C < 2 .000 
,23.1 25.9 26,9 


23.1 26.0 25.7 

1 < C < 2 
21.6 25.9 26.8 


*22.9 22.1 t4.7 

1 < C < 2 
21.4 24,6 26,2 


Teacher Attitude •)! 
(Increase) 

Hd 


10.0 10.1 10.5 

2 < C < 1 .000 2.5 .32 

10.1 10,1 10,9 


*10.1 10.0 10.3. 

C < 2 < 1 .160 
10.1 10.1 10.4 


*10.1 9.7 10.1 
1 < 2 < C 
9,4 9.7 10.0 


9,4 10,0, 10.7, 

2 < C < 1 
9.7 10.2 11.3 


Teacher Attitude X 
(Students' feeling) 

Hd' 


15.3 16.2 18.3 

C < 2 < 1 .000 3.3 .82 
15.5 16.6 18,2 


15.3 16.6 17.6 

C < 2 < 1 ,000 
15.5 17.1 17.9 


*15.3 17.2 15.8 • 

C < \ < 2 
15.5 16.4 16.8 


15.6 16.5 18.1 

C < 2 < 1 
15.9 17.0 19.8 


Teacher 0r?-Task X 
(Teacher-Oirected) 

Hd 


59.2 73,1 77.1 

C < 2 < 1 .000 49.0' .60 
63.9% 89. W 98.3% 


59.4 77.0 83.6 

C ,< 2 < 1 .000 
63.8 95.4 99.0 


59.4 71.5 7.9.7 

C < 2 < 1 
68.8 .95.8 96.0 

0 


55.0 74.5 78,2 

C < 2 < 1 
68.9 96.1 97.1 


Teacher On-Task I 
(Student-Directed) 

Hd 


*83.2* 78.4 80.6 

C < 2 < 1 .048 37.7 .14 
87.7 ' 88.7 92.8 


*83.1 81.2 81.6 

C <..2 < 1 .665 
88.0 88.7 93.8 


*83. 1 81,2 79". 7 

C < 2 . < 1 
88.0 " 88.9 93.8 


*83 . 4 7 7.9 7 8.6 

C < 2 < 1 
■ 87.8 88.0- 94,3 



Eliminating students who saw less than 5 filmstrips, took less than 3 practice tests, had teachers who were rated low on quality of implementation or 
support, or were in special education programs, or had English as a second language. 

^Eliminating students who saw less than 9 filmstrips, took less than 1 practice tests, had teachers who were rated low on quality of implementation or 
support, or were in special education programs, or had English as a second language. v 

C AU probability estimates are based on one-way analyses of variance between means of the three groups. In many cases, distributions are substantially 
skewed so that medians are a better indicator of central tendency. Medians for' each group on all variables are also reported, Asterisks are used to indicate 
where the order of groups differs depending on whether means or medians are reported. The order of groups represented in the chart always follows medians 
when there is a disagreement, 

d T 0 '.labeled ES refers to.. the standardized mean differences between the highest. and lowest group or (I hl h - T l0 J * SD control group . This measure 
shERJCswiTiended by Glass (1977) for examining the results of various studies using a common metric. 
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Table 40 (cont'd) 



p£7 ; ■:■ 



Variable 


Control 
Group 

All Students P C SD ES ( 


Students Receiving 
Majority of Treatment 3 P c 


-• "•■■•ju. 

Students Receiving 
All of Treatment ■ 


' mie" l students ■ 

Receiving Al^ of 
Treatment 


Achievement Test t 
(Student-Directed) 

Hd 


♦-.08 .08 .01 

1 < C < 2 .033 .9 .19 
.19 .33 ,38 


Ml .28 .16 ■ 
1 < C < 2 .024 
.38 .42 .43 


.08 .28 ■ .33 
1 < C < 2 
.38 .43 .48 


-.86 -.40 ! -.13 

1 < C ^< 2 
-.85 -.28 .23 


Achievement Test J 
(Teacher-Directed) 

, - Md 


-.10 .04 .07 

1 < C < 2 .023 .9 ,24 
■■ -.06 09- - .32 


*.17 .12 .20 

C < 1 < 2 .508 
22 .29 40 


*.n .12 .29 

C < 1 < 2 
.23, 23 ..SL 


-.69 -.37 -.19 
1 < C < 2 
....,,67 -,3L~ -M. 


Achievement Test T 
(Hath) 

Md 


-.11 , -.07 .19 

1 < 2 < C .000 .9 .36 
-.12 -.06 .24 


.05 .06 .30 

2 < 1 < C .000 
.09 .11 .37 


.13 .22 .30' 

1 < 2 < C 
.13 .32 .79 


-.65 -.41 -.11 

1 < 2 < C 
-.62 -.48 -.18 


J 

Achievement Test 

(Total Reading) Hd 


*-.10 .07 .05 

1 < . C < 2 ,013 .9 .29 
.05, .22 .34 


*,11 .30 .20 

1 < C < 2 .058 
.22 .40 .40 


.08 .26 .35 

1 < C < 2 
.18 .40 .46 


-.86 -.47 -.21 

1 < C < 2 
-.86 -.39 .08 


Achievement Test X 
(Total) 

Md 


-.12 .01 .13 
1 < 2 < C .000 .9 .22 
■ .03 .13 .25 


.12 .17 .30 

1 < 2 < C .014 
.21 .27 .37 ■ 


t 

*.13 .34 ,30 

1 < 2 < C 
,20 .36 .37 


-.81 -.37 -.28 

1 < C < 2 
-.72 -.40 -.07 


Quality of Test ' X 
Administration 

Md 


*48.8 50.6 49.7 

C < 2 < 1 .000 3.8 .28 
48.3 50.9 52.1 


48.8 51.3 52.3 

C < 2 < 1 .000 
48.3 51.6 52.4 


48.8 51.1 52.6 

C < 2 < 1 
48.3 51.6 52.5 


49.4 ■ 50.7 52.1 
• C < 2 < 1 
50.3 51.7 52.4 


Student I 
Attitude 

5 Md 


11.7 11.9 12.4 , 

■ 2 < 1 < C .OIL 3.5' .20 

11.2 11.3 12.1 


11.5 11.6 12.4 

1 < 2 < C .000 
11.05 11,06 , 12.2 


11.6 11.7 12.4 

1 < 2 < ,C 
11.15 11.20 12.2 


11.2 11.5 12.3 

1 < 2 < 
11.0 11.0 11.7 


Student On-Task X 
(Teacher-Directed) 

Md 


. 88.4 89.2 89.7 

1 < C < 2 . .785 11.1 .32 
90.8 92.5 94.4 


*B9.4' 39.5 .89.2 

1 < C < 2 .992 
90.5' 94.0, 94.8 


*87.1 86.8 89.5 

1 < 2 < C 
87.3 91.0 94.0 


89.4 88.5 92.1 
1 < C < 2 

87.5 94.8 98.0 


Student On-Task X 
(Student-Directed) 

. Md 


*90.5 90.6 ■ 89.9 

C < 1 < 2 .911 9.3 .14 
92.5 ' 93.4 93.8 


89.5 90.9 93.0 

2 < C < 1 429 
93.7 94.3 94.5 


*92.3 87.6 90.9 

1 < 2 < C 
93.3 93.6 94.3 


89.4 90.8 93.4 

C < 1 < 2 
89.3 91.5 95.0 



Eliminating students who saw less than 5 filmstrips, took less than 3 practice tests, had teachers who wers rated low on quality of implement at or 
support, or were in special education programs, or had English as a second .language. 

Eliminating students who, saw less, than 9 filmstrips, took less than 7 practice tests, had teachers who were rated lo* on quality of implementation or 
support, or were in special education programs, or had English as a second. language* 

C A11 probability estimates are based on one-way analyses of variance between means of the Uiiye) groups. "In many cases, distributions are- substantially 
skewed so that medians are a better indicator of central tendency. Medians for each group on all variables are also reported. Asterisks are used to indicate 
where the order of groups differs depending on whether means or medians are reported. Tk j order of groups represented in the chart always follows medians 
when there is a disagreement. 

d ' O . labeled ES refers to the standardized mean differences between the highest and lowest group or (X hjgh -T low ) * SD coatrol group ,' This measure 
lERs£Cecoi«nded by Glass (1977) for examining the results of various studies :<iing a rmm metric. u ' , 
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than means are a better indicator of central tendency. In a few. cases (noted 
by asterisks in the row for means) , the median scores for groups, were in a 
different order than means. In other cases, medians substantially reduced the 
differences or , in a few cases Y 

Therefore, the probability levels given are only one source of information and 
should not be over interpreted. 

The most meaningful information about program effect is the Effect Size 
(ES) measure given as the last column for the "All Students" data. This 
Effect Size measure is an indicator of the standardized difference between the 
highest and lowest groups using the standard deviation of the control group as 
the standardizing metric. In most educational measures, an effect size of 
less than 1/3 of a standard deviation is not considered practically 
significant even though it may be statistically significant. Statistical 
significance indicates the probability of obtaining differences as large or 
larger as those observed in the experiment if one were to. randomly draw 
•samples of, the same size from the same population. In cases where sample 
sizes are quite large (such as this project), it is not unusual to obtain . 
statistical significance even though' the differences are educationally and 
practically not very important. The effect size differences between groups 
reported in the last column of the "All Students" subcategory are computed 
based on medians instead of means and should be used in conjunction with' 
probability levels and the order of groups in interpreting the results. 

Additional information including sample sizes, means, standard devia- 
tions, and medians f or al 1 dependent variables broken down by experimental 
groups for each of the various subsamples is included in Appendix H. Also 
included in Appendix H are similar breakdowns for the different districts tha 
participated in the project and descriptive statistics for each of the 
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variables collected by the project. None of these more detailed data aVter 
the basic interpretations to be presented in the following sections. The 
detailed data were not included in the main body of the report because the 
additional detail obscured rather than illuminated the major findings. Using 
the resylts reported in Table 40 as the basic information and supplementing it 
with other data as noted, the results of the project for "each of the major 
dependent variables for each of the experimental groups are summarized below. 

L 

Teacher and.student attitudes and behavior during standardized 
achievement tests . As can be seen in Table 40, there was approximately a 
third of a standard deviation difference on Teachers' Attitude* Toward 
Standardized Achievement Tests between Experimental Group I and the, Control 
Group, with Experimental Group II scoring in between. .The major differences 
were in teachers' perceptions of the usefulness of standardized achievement 
tests and teachers' perceptions about how students felt about standardized 
achievement tests. The order of the groups on this outcome measure are what 
would have been predicted if the project were having its -anti ci pated effect in 
terms of training teachers to be more competent and informed users of 
information from standardized achievement tests. 'j 

Teachers on-task behavior dur^yig the administration of the standardized 
achievement test was also improved. For both student- and teacher-directed 
subtests, there was ^approximately one-third of a standard deviation difference 
-between Experimental Group I and the-Control Group with Experimental Group II 
falling in between. The largest differences (.6 standard deviation units) was 
found on the teacher -directed subtest.*' This is not surprising since' the 
procedures for teacher-directed subtests are much more complex in terms of 
proper test administration and require a higher level of skill and more 
constant involvement of the teacher. During the student-directed subtest, 
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directions are given once at the beginninq of the test and then students work 
on their own until the time limit has elapsed. 

« In interpreting the differences between qroups for teacher on-task 
behavior, the definitions^ of on-task behavior described in the implementation 
section of this report should be kept in mind. As would be expected, teache^ 
on-task time was defined simi 1 arly (for bbth the instructional and data 
collection components. Therefore, conclusions about differences in on-task 
behavior of teachers indicate that teachers did indeed implement the types of 
things which were taught during the traininq. The real issue is, of course, 
whether these°types of activities lead to a more valid test administration. 
Definitive answers to the question of whether these behaviors lead to more 
valid test scores are extremely complex. , - 

The data indicatinq that teachers in Experimental Group I were more on- 

l 

task during the administration of the standardized achievement test are 
supported by the ratings of quality of test administration. As noted in the 
implementation section, these ratings were done by observers who were not 
. informed about the purpose of the experiment or the constituency of the 
groups. The differences between groups on quality of test administration are 
smaller (approximately a quarter of a standard deviation) but are in the 
direction which would be predicted if the project were having the anticipated 
effect. Again, the real meaning of these results. can be interpreted best by 
looking at the type of items contained on the Quality of Test Administration 
Checklist described in the. implementation section. Item level data from this 
checklist are contained in Appendix G. - Taken together, the data from the 
teacher on-task behav-ior and the Quality of Test Administration Checklist 
indicate that. the project had a positive effect on the procedures used by 
teachers to administer standardized achievement tests. 
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Data from the student on-task behavior during standardized achievement 

t 

testing are more difficult to interpret. As shown in Table 40, there are 

virtually no differences—statist ical or educational --between groups when the 

* 

mean- st udent - on-t ask- scores, are considered. "Howev.fir.»...P-attj.cuJarly. fp£_t!?iL 1 

teacher-directed subtest, these distributions are substantially skeweu and 

'medians indicate approximately a third of a standard deviation difference 

Ci ..... ^ * 

(favoring the control group for some subgroups and E2 for others). For the 

student-directed test, means and medians are in a different order but are-all 

very close (approximately a tenth of a standard deviation difference between' 

high and low). The flip-flopping of scores depending on which subtest, which 

subgroup, afid whether means or medians are examined suggests that differences 

£re not educational ly meaningful . Furthermore, the very high levels of 

student on-task behavior across all subtests and . subgroupings (87% - 98%),_and 

*rthe fact that student on-tas^k and achievement test scores using the student as 

the unit of analysis is uncorrel ated (r_ ranges from -.00 to .01; see Table 43) 

suggests that the measure of student on-task behavior may not have been an ^ 

accurate measure of oh-task . behavior . There*-is an expensive body of 

literature which suggests that -on-task behavior is moderately related to w . 

achievement level? in instructional settings, and it is reasonable to assume 

that on-task behavior should be norrel flt-nrl With SfiOrGS on standardized 

achievement tests. Even though the interrater consistency for the student 

' on-task behavior measures were high, the data reported above raise concerns as 

to whether the essence of student on-task behavior during standardized testing 

was really measured. v *~ K 

Student ac hievement. As can be seen from Table 40, when median scored 

are considered, Group II had *£he highest standardized achievement test scores. 

for all of the reading subtests; and the Control Group had the highest ; scores. 
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for the math subtest and for the total test battery. Statistical tests of 
significance computed from the analysis of variance using means were 
significant for all achievement test- measures , sometimes fa.vbring Group II and 
sometimes favoring the Control Group. The magnitude of the differences 
between groups is generally increased slightly when medians instead of means 
are used, but the order of differences changes for the total reading score and 
the student-directed reading test score. Although statistically significant, 
the differences are generally small (average ES = .26, janging from .19 to .36 
of a standard deviation difference). These already small differences are 
reduced even further when the analyses are limited to those students who 
received the majority or all of the treatment and only reading subtests are 
considered (average ES = .19). In this subset of the data, E2 had the highest 
scores for all three reading tests with El receiving second highest for one of 
the subtests, and the Control group for the other two subtests. 

Math subtest scores were collected and analyzed for /two reasons. First, 
if the treatment were effective, it would be interesting to see if the results 
generalized to other testing areas for which no explicit training was 
included. Secondly, if the treatment did not appear to be effective, math c 
scores could be used as a partial way of checking the comparability of the 
groups.. In other words, if the scores between thl groups on math were 
radically different, one would be concerned about the comparability of the 
groups before treatment began because one woultf not expect the treatment to 
have as powerful an effect on the math subscofes as on the reading subscores. 
The fact that the math and total subtest scores are the only subtests where 
'the control group scored the highest, and the fact that some of these effect 
sizes are substantial (e.g., .66 in the "/all of the treatment" subgroup) 
suggests that there may be some sample comparability problems. This issue. is 
discussed further in a subsequent section. Although itfany of the differences 
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between qroups are statistically significant , they are relatively small. The 
only consistent finding in the order of these differences is that Group I was 
regularly the lowest-scoring group. For reading test scores, Group II was 
always the highest-scoring group, while for the math and total scores, the 
Control Group was the highes t-scori ng group. The fact that the Control Group 
scored highest on the total test is at least partly a function of the heavy 
contribution of math scores to this total battery score. 

Although not particularly clear-cut, these data indicate that the project 
did not. have a meaningful effect on student achievement test scores as had 
been hypothesized. If the project was contributing to standardized 
achievement tests being a more valid indicator of students 1 knowledge in a 
particular content area, it was anticipated that scores would increase. In 
other words, previous research had indicated that students 1 scopes on 
standardized achievement tests were at least partly influenced by the format 
of the test, students' test -taking skills, and the degree to which students 
were motivated to do well on the test. Each of these confounding variables 
seemed to result in students appearing to know less than they really did.- 
Therefore, it was hypothesized that the experimental procedures would remove 
these influences causing test scores to increase. This clearly did not 
happen. 

Of course, it is possible for the test scores to become more valid even 
if the scores do not increase. This could happen if the pattern of correct 
answers within a subtest changed. However, this is a much more unlikely 
occurrence and themost reasonable conclusion from the data is that the \ 
intervention procedures had little, if any, effect on students 1 scores on 
standardized tests. 

Another possible explanation for the observed standardized achievement 
test scores is that the reinforcement procedures (which was that portion of 
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the program which was least well received by teachers) actually depressed 
students' scores^ i nstead of the anticipated elevating effect. This hypothesis 
is given some support from the fact that Group II which did not receive the 
reinforcement procedures was consistently the highest-scoring group on those 
measures which were most directly related to the treatment. However, even 
here, the differences between E2 and the Control .group abe always small 
(ranging from .05 to .23 when all students are considered, and .00 to .27 when 
only students who received the majority or all of the treatment are 
considered). The average differences (.04 to .13) are not large enough to be 
practically significant, and given the fact that scores flip-flop from group 
to group and from test to test suggests that random fluctuation is a more 
plausible explanation. 

The lack of effect from the intervention program on students' 
standardized achievement test score is shown graphically in Figure 5 which 
contains Box and Whisker diagrams for each of the achievement subtests 
(modified from Tukey, 1977). As shown on the next page s the box of a Box and 
Whisker diagram depicts the interquartile range of scores. The "whisker" 
extending from the box shows the range of scores with the Crosshatch on the 
"whisker" showing the 5th or the 95th percentile and the small bo* at the end 
of the "whisker" showing the most extreme scores. 

As can be seen on the Box and Whisker diagram for the student-directed , 
reading subtest of the achievement test scores in Figure 5, the interquartile 
ranges for all three groups are almost completely overlapping (the same degree 
of overlap is present for virtually ,al 1 of the dependent variables). The 
normal curves to 1 the right of the Box and Whisker diagram for this subtest 
shows the amount of overlap which would occur if normal curves were 
constructed using the mean and standard deviation from the Afferent groups. 
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13 • 



-a 



95th %tile {which in this case, is 
the same as the extreme high score) 



75ttf %tile 

50th %tile (median) 

.25th %tile 

5th %tile" 
extreme lew score 



Teacher Attitude 
Use of Tests 



As would be expected, overlap is again almost complete. The normal curves 
with overlaying bar graphs shown below the Box and Whisker diagrams show the 
actual data distribution (bar graphs) and how well they conform to normal 
curves. As can be seen, for all three groups the data are negatively skewed; 
worse so for Groups I and II, This negative skewing is also apparent- from the 
longer tails on the lower portion of the Box and Whisker diagrams for Groups I 
and II, Figure 5 shows the same type of patterns for all of the achievement 
test scores,* The basic message from these diagrams is similar to the 
conclusion outlined above, i.e., differences between the three groups on 
achievement /test scores are small and educationally insignificant. 



236 



Figure -5 

Box urd Whisker Diagrams and Normal Curve Representations 
for Student-Directed Reading Achievement Subtest 
(Square Root Transformation) 





Note: Data used' in the Box and Whisker diagrams have been, transformed as indicated using square 
root or log transformations to make the distributions of each group more comparable 
(Tukey, 1977). ■ 



Figure 5 (cont'd) 

Box and Whisker Diagrams and Normal Curve Representations 
for Teacher-Directed Reading Subtest 
' (Log Transformation) 




Figure 5 (cont'd) 

Box and Whisker Diagrams and Normal Curve Representations 

for Total Math Subtests 
(Square Root Transformation) 





Figure 5 (cont'd) 



Box and Whisker Diagrams and Normal Curve Representations 
for Total Reading Subtests 
(Square Root Transformation) 
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Figure 5 (cont'd) 



Box and Whisker Diagrams and Normal Curve Representations <* 
for, Total Achievement Test Scores 
(Square Root Transformation ) 
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. Several other analyses were done to better understand the pattern of 
scores on standardized achievement tests. Crosstabs with various other 
demographic characteristics and implementation variables were computed with 
total achiever- j,: test sc ^es. These ulys-s are eported in Table 41. As 
can b seer ^st. s v ~e u :i al ly cr^ ed with the number of . 1 

fllnbirips . .wed uj stuu^.ts, "her of pr ~c ests taken, teacher support 
of the program, quality of teacher implementation, the number of reinforcement 
points earned, the mean percentage correct on practice tests, and whether . 
students were in special education, Title I, or English As A Second Language 
programs. Some of this relationship was curvilinear and consequently does 
not appear in the correlational data reported in Table 42 later in this 
report . 

The fact that students who saw the most filmstrips received the highest 
scores on achievement tests could be viewed as an indicator that the program 
had indeed contributed to higher scores on standardized achievement tests.. 
However, this type of correlational data is a much weaker indicator of 
causality than the data reported previously from Table 40 because a number of 
directionality and third variable explanations are plausible explanations 
(even though they are impossible to examine directly from the available data).. 
For example, in El the average achievement test score (reported in Z scores) 
went from -.69 for students viewing 1 to 6 filmstrips, to -.51 for students 
viewing 7, to -.14 for students viewing 8, to -.03 for. students viewing 9 
f ilmstrips--an increase of .66 standard deviation units.. However, those, 
students who saw all 9 filmstrips are more likely to be. students who have qood 
attendance records in school and consequently are beinq: exposed to more 
instruction, probably have better attitudes towards school as indicated by 
their better attendance.; and likely" come from homes where more value is placed 
on education. 'Although most teachers did make substantial efforts to 
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Table 41 ^ 
Total Achievement Test Scores by Group by Various Independent Measures ' 



Experimental Group 



1-6 



a 

L 

E 



£ 8 



II 



E 1 

k 

at 

i 

I*- 

: * 

£ 

a 

| 3 

u 
10 

4) 



n » 30 (5.9X) 
* a -.69 SD * 1.23 


n = 20 (4.8*) 
X.» -.22 SO * 1.07 


n * 31 (6.0X) 
* * -.SI SO * 1.25 


n - 58 (14.03!) 
X" = .05 SD « 1.12 


n = 119 (23.53!) 
X s -.14 SO * 1.00 


n ■ 99 (24.03!) 
J>-.17 SD=-1.05~ 


n - 327 (65.03!) 
J ■ -.03 SD =■ .98 


n =■ 240 • (58.0O 
I » .09 SD = .98 


n 1 507 n » 417 
X « -.12 7 ■ .008 
SO * 1.04 SO » 1.02 


Experimental Group 
I II 


n • 24 (4.7X) 
X « -.70 SO ' .98 


n » 24 (6'.0*) 
X » -.12 SD = ,93 


n « 173 (34. OS) 
X ' -.14 SO » .98 


n » 178 (43.0t) 
X » -.10 SD * 1.04 


•n a 3l2 (61.33!) 
J - -.07 SD • 1.06 


n > 215 (52.03!) 
I * .11 SD * 1.01 


n 1 509 n * 417 
7 » -.12 7 *. 008 
SO ■ 1.04 SO * 1.02 

BEST C0?j fflU 



p < .06 



p < .05 
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Experimental Group 
I II 



c 

4J 

i= 1-5 

1/1 
iA 

.£ 

u 

L 



O 



c 
o 

c 



i 1 



e 



OJ 
o u 
CJ 



2? 3 



3 

a 



n - 62 * (12.0*) 
X 1 -.32 SO « 1.10 


^40 (9.6*) 
X « -.32 SD - 1.03 


n - 141 (28.0*) 
X 3 -.17 SD =■ 1.03 


n * 101 (24.0*) 
7 « -.16 50 = 1.09 


n =■ 304 (60.0*) 
I '- -.05 SD » 1.02 


n ■ 274 (66.0*) 
X».l3 SD * .98 


. n * 504 n * 416 
X * -.12 7 - .01 
SO * 1.04 SD * 1.02 

Experimental Group 
I II 


n 3 95 (18.7*) 
J * -.36 SD » 1.03 , 


n ■ 50 (12.0*) 
X ■ 0.003 SD * 1.04 


n » 99 (19. 4X) 
X " -.22 • SD » 1.07 


n » 154 (37.0*) 1 
7 » -.21 S0~'l.08 


n.= 315 . (62.0%) 
7 ■ -.02 SD » 1.02 


n =■ 213 (51.0*) 
I * .17 SO .95 ^ 



p < .05' 



p < .OS 



n * 509 
y * -.12 
SD'* 1.04 



n = 417 
X * .008 
SD * 1.02 
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Table 41 (cont'd) 



S 



Experimental Group 
I 



4J 




n ■ 




/in /w\ 
(19. OX) 




1-2.5 






L 




T « 


-.69 


SD * .80 


a. 










"0 i 










c 




n 1 


133 


(27. OX) 


L 
«J 


2.51-3.5 






U4 




7 » 


-.35 


SD = 97 


Ul 




















0 
a. 




n 3 

n 


117 
11 / 






3.51-4.5 






c 

i 




T . 

A 


(IP 


















n ■ 


93 


(19.0*) 


C 


4^51-5.5 










X« 


.30 


SO * .96 












o 














n » 


49 


(9.9<) 




5.51-9.5 








Ik 




• 7* 


.57 


SD ' .99 



n ■ 492 
I - -.10 
SD » 1.03 
r xy = ;37 



o 

f 

+> 

o 
<T> 
i- 
o, 

c 

o 

+> 
o 

o 
u 



c 



1-60* 



60.1-80)! 



80.1-855! 



JS.1-90X 



90.1-953! 



95.1-100)1 



Experimental Group 
I II 



n'»Z0 (4.10 

X » V1.67 SO » .82 
L ^_ 


n - 54 (13 .OX) 
7 « -H6 SD * 1.25 


n * 135\' (27.00 

7 » -.98 SO » .78 

_«e 


n - 90 ' (22.0S) ■ 
7 —,65 ^ SD » .79 


n 1 75 (15. OS) 
7 * -.34 S0/« .65 


n - 37 ' (9, OS) 
I ■ -.11 SD » .60 


-n » 81 (16.5S) 

7 * .07 SO * .68 


n.» 57 (14.0SL. 

X'» .21 SO « .58 


n - 119 (24. OS) 
7 » .67 SO » .58 


n 3 94 (23.0S) ' 
7 =■ .55 SD » .53 


n - 62! (13. OS) 
I ■ .87 SO » .68 


n » 81 (20. OS) 
7 • .79 SO ■ .75 



p"< .10 



n ■ 492 
7 « -.10 
SO » 1.03 



n ■ 413 
"7 - .01 
SO - 1.03 



u No 

4J 



c 

OJ 
D 



Experimental Group 



Control 



n * 


353 


(69. OS) 


n * 295 


(71.0S) 


n ' 349 


(77.0S) 




{ 


7 » 


.31 


SD * .82 


7 ■ .23 


SD ' .98 


7 ■ .34 


SD * .85 






n » 


155 


(31. OS)' 


n * 120 


(29.0S) 


n » 102 


(23.GS)' 


p < .0004 

i 


* 


I « 


-1.11 


SO - .77 


7 » -.52 


SD » .95 


7 « -.60 


SO = .78 







n ■ 508 
7 ■ -.12 
SD ■ 1.03 



n ■ 415 
7 « .009 
SO ■ 1.02 



n * 451 
7 »,.13 
SD * .92 



r 



9 
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■ Table 41 (cont'd) 



c 

0 
-r 
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do make-ups'wi th the filmstrips., the fact that many students did not see some 
of the filmstrips makes these expl anations plausible and makes it somewhat 
unreasonable to suppose that the viewing of filmstrips resulted in higher test 
scores! alone. The same explanations can be offered for each of the other 
variables where it looks like increased exposure or participation in the 
program resulted in higher test scores. 

Scores for different groups of students (Title I, Special Education, and 
English As A Second Language) are exactly what one would expect to see based 
on our knowledge of the types of children who usually participate in those 
programs. Achievement test scores for these variables and their predicted 
direction lends some credibility to the test being used and hence, more 
confidence to the results reported above concerning the achievement test. 

Table 42 contains the intercorrel ati on matrix for the principal 
variables, both dependent, independent, and descriptive included in the 
project. Generally, the correlations reported in this table support the kinds 
of findings reported above. For-example, student on-task behavior, teacher 
on-task behavior, teacher attitude and student attitude, and quality of test 
administration are generally uncorrected with achievement test scores. The 
consistency of these findings lends f urther^support to the conclusion that the 
procedures as implemented had very little impact on achievement test 
scores . . \ x . 

Conclusions 

X 

The basic purpose of this project was to develop, implement, and 
evaluate training materials and procedures which would result in more valid 
.standardized achievement test scores as a result of improvements in (a) 
students 1 test-taking skills, attitudes, and motivation towards test taking, 
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Table 42 

Intercorrelation Matrix for Project Variables 
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and (b) teachers' attitudes towards standardized tests and quality of test 
administration. As measured by the project developed instruments, the 
intervention procedures did result in improved teachers' attitude towards 
tests and quality of test administration. Furthermore, teachers were 
enthusiastically supportive of the material s,. plan to continue using them, and 
felt that the materials resulted in substantial improvement jn students 1 
test-taking abilities and students' attitude towards tests. However, the more 
objective data collected by the project indicate that there were no increases 
in students' test-taking skills, attitudes toward standardized testing, or 
performance on standardized achievement tests. 1 
These data raise some perplexing questions, both in view oT previous | 
research and in view of teachers 1 perceptions about the effectiveness of th'e 
project. First, as reported in the review of literature in Chapter II, \ 
previous research (from both published and unpublished sources) indicates that 
training students in test-taking skills or providing them with practice in 
taking tests has a substantial effect (approximately 2/3 of a standard 
deviation) on test scores. Even when the results of that research are limited 
to high-quality studies, the average effect attributable to training was 
approximately a third of a standard deviation. The intervention in this 
project combined both training in test-taking skills and practice on tests 
similar to the standardized^ achievement tests the students would be taking. 
In addition, previous research reported in. the review of literature has also 
suggested that areas such as checking work, \ systematic elimination, problem 
attack strategies, reduction of test, anxiety,- examiner/examinee relationships, 
advance notification, and feedback on test performance are all posi t i '°1y 
correlated with scores on standardized achievement tests. All of these 
factors were- included in the training packages designed. Finally, when 
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compared to the interventions in previous research, the training delivered to 
these students was a relatively intense, systematically delivered training 
experience of long duration, with strong follow-up and monitoring. 1 In 
spite of this, very little difference was observed between the groups on test 
scores, and most of these differences were not in the predicted direction. In 
fact, those students who received the most training received the lowest 
scores. 

The fact that differences were not found is even more perplexing in light 
of teachers 1 very positive response to the program materials. Most teachers 
who used the materials during this year plan to continue using the materials 
in the future and felt that the, materials had improved their students attitude 
and increased performance on the standardized achievement test. However; the 
fact remains that none of these perceived differences were evident on objec- 
tive measures for which data were collected. These contradictions witji 

/ 

previous research and with teachers' perceptions of benefit suggest that 
further evaluation of the materials developed in this project should /be 
conducted before final conclusions are drawn. 

A number of facts learned during this project should assist in making 
further evaluation as meaningful as possible. Summarized in the remaining two 
sections are potential explanations for why the training materials and proce- 
dures were not as effective as they might have been and factors which should 
be taken, into account in conducting further research and evaluation. 

Implementation Factors Possibly Related to Results 

Even though previous research suggests that an intervention of the t 
delivered in this project should have led to substantial differences betv^uu 



*As noted in the Procedures Section, there were some classrooms 
where the training was implemented less well or where some training was not 
delivered. However, excluding these classrooms from the analyses as reported 
in the Results Section made no practical difference. 
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groups on students' performance and attitudes during standardized achievement 
tests; few such differences were noted, and those that were, were relatively 
small. A closer examination of the data and the implement at ion procedures 
suggests a number of possible explanations. None of these can "be proven" as 
a causal agent in the results that were obtained. They are presented here to 
provide the reader with a more complete context in interpreting the results 
that have been presented as well as providing the background for further 
research and evaluation. 

Amount of practice per concept . As noted in the implementation section, 
over 50 different concepts were presented to students in the filmstrips. Many 
of these concepts are reasonably complex such as elimination, differentiating 
between correct answers and look-alikes or sound-al ikes, deductive reasoning, 
and checking work. During the filmstrips, students were provided with a 
certain amount of practice in each of these concepts. The practice tests 
which were given on a different day from the filmstrip provided additional 
.opportunity for practicing these concepts; even though practice tests were not 
designed to give explicit practice with each concept taught in an associated 
filmstrip. 

Because every filmstrip lasted from 20 to 40 minutes and contained four 
or more major concepts, it is possible that students did not- have enough 
practice time to really master each of the concepts taught in the filmstrips. 
Breaking the filmstrips into smaller pieces and providing more opportunity for 
practice may result in more effective instruction. However, this possibility 
must be considered in light of the fact that the training provided in this 
intervention was already much more substantial and contained as much or more,! 
practice than most other efforts at training students in test-taking skills ' 
reported in the literature. 
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Reinforcement procedures . Based on previous work by Taylor and White 
(1982), as well as the previous research reported in the review of literature 
in Chapter II, one component of Experimental Group I was designed to motivate 
students to try their best on tests through using, a self-charting procedure. 
Taylor and White (1982) demonstrated that paying students to perform better 
than predicted from their pretest scores on standardized achievement tests 
resulted in approximately half a standard deviation difference between 
experimental and control groups. Because it was unacceptable to continue 
paying students for their performance on standardized achievement tests, the 
self-charting procedures associated with practice tests were designed to 
determine if increased motivation during practice tests could be generated and 
if such motivation would generalize to the standardized achievement test. 

Unfortunately, the design of the experiment did not allow for the effects 
of the reinforcement procedures to be estimated separately from the effects of 
training students, training teachers, and participating in the practice tests. 
However, teacher feedback indicated that the reinforcement procedures were the 
weakest. part of the program and were sometimes confusing for students. As 
noted in the Procedures Section, the self-charting procedures used to motivate 
students are reasonably complex to implement. Some teachers noted in the 
debriefing that these particular procedures did not seem to be motivating for 
students even on the practice tests. If the motivational procedures were 

ineffective on the practice tests, the probability of increased motivation on 

\ 

the actual standardized achievement test is virtually nonexistent. There is 
even a possibility that instead of being motivating, the so-called, 
reinforcement procedures were actually a negative influence ^or children in 
Experimental Group I. \ 

For/nat of practice tests The construction of the practice ttsL:, was 
done so that the tests paralleled as closely as possible the standardize^ 
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achievement test that the students would be taking in the springtime. The 
last several practice tests contained several items for each subtest included 
on the reading portion of that district's standardized achievement test. A 
single practice test (which took a maximum of 30 minutes) contained up to 7 
subtests depending on the district. For each subtest, there was a sample item 
and directions. Consequently, much of the practice test time was spent giving 
directions and reviewing sample items. This may have reduced the 
effectiveness of the practice test because instead of spending most^of the 0 
time practicing taking tests, students were spending substantial time 
listening to directions and going over sample items. For the first several 
practice tests, this was not a problem because the tests were relatively 
short. By the time the problem was recognized in the longer practice tests, 
it was too late to redesign the tests so that any given practice test would 
have only one or two subtests. 

Heterogeneity or non-comparability of classrooms within each experimental 
group. Although cl asses" were randomly assigned to each of the treatment 
groups fron a larger population who had expressed their willingness to 
participate in the program, there is a possibility that sampling fluctuation 
could have .resulted in non-comparable groups. Because pretest data were not 
available, it is impossible to check this possibility directly. However, 
several other sources of data were examined. First, the results of 
standardized achievement tests for third grade students in the same schools 
were examined. These data are reported in Table 43. As can be seen, the 
results are reasonably comparable. Any bias which does exist would have 
contributed to higher scores for Experimental Group I which was the 
lowest-scoring group on almost all of the achievement test scores. 

jt civ ' ' ho not^d, however, that the fluctuation in the third grade 
scores w ii us b ■ g* eater as tiic differences in scores observed between 
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the. three second grade experimental groups. Thus, although these data suggest 
that such fluctuation is easily possible, the direction of the fluctuation at 
the third grade leads one to believe that such fluctuation is not a primary 
explanation for the results observed with the second grade students 
participating in the study. 

A related possibility is that a small number of "outlier" teachers in 
Groups El or E2 unduly affected the data. This possibility was examined by 
constructing Box and Whisker plots for each of the dependent vari ables using 
teachers as the unit of analysis. If there were "outliers" in Groups El or 
E2, this would show up by extremely long tails for either Groups El or E2 and 
the hash mark for the 5th percentile being substantially nearer the Box for 
Control Group than for Groups. El or E2. These data are pre sented i n Figure 6. 
As can be seen. from the data for the achievement test scores, there are no 

major differences between the groups. It is apparent that for all of the 

% - 
groups on 'several dependent measures such as quality of test administration, 

and time on task for teachers, the distributions are negatively skewed. 

However, these data do not suggest that a small number of teachers are unduly 

affecting the scores of Groups El or E2 which might lead to a 

misinterpretation of the data. 

"John Henry" effect . The "John Henry effect" suggests that control group 

teachers who know they are being compared to an . experimental treatment will 

try harder and thus perform better than they would under typical conditions. 

If such extra efforts were present^ on the part of control group teachers, the 

results of the experiment would be. inval idated . Because all of the control 

group teachers were aware of the general nature of the study being conducted., 

it is possible that such a "John Henry effect" existed. However, the results 
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of the debriefing interviews and the contact with control group teachers in 
preparation for collecting the observational data suggests that such an effect 
is unlikely. 

Timing of implementation . The original schedule for implementation was 
that teachers would show a filmstrip approximately every two weeks with 
practice tests in between. Unfortunately, production schedules for the 
filmstrips had ! altered due to unforeseen circumstances, and the first 
three filmstrips wei a spread out over three months with the last five 
filmstrips occurring in a period of only eight week:;. In addition, as noted 
in the Procedures Section, none of the teachers in Nebo District showed 
Filmstrip #7. It may have been that such irregularities in the implementation 
attenuated gains that might have resulted from the filmstrips and practice 
tests. Teachers in the debriefing interviews did not feel this was a serious 
problem, but it is hard to estimate what effect the scheduling irregularities 
may have had on children. 

Does better test administration lead to higher scores? - Previous research 
reported by Taylor and White (1982) demonstrated that students who took 
standaiu,.ed U .. ; from teachers who i.ad b^e.i trained in proper test 
administration obtained higher test scores than students who had not. In some 
ways, it is logical that better test administration would result in higher 
scores. For example, high quality test administration would mean that 
teachers would give better directions, would be better at keeping students on 
"task, and would prepare students better for taking the test. All of these 
things would probably lead to higher test scores. Alternatively, however, 
better test administration could lead to lower. scores if the better test 
administration reduced cheating and eliminated unfair teacher assistance or 
hint'.. , a- ■ result of training in tesL-taking skills and better test 
administration procedures, students' scores improve more than would have been 

3 
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predicted from a "practice effect", one can be relatively confident that the 

previous scores were not valid indicators of what the student knows. However, 

if students receive lower test scores, it is unclear whether the latter test 

scores are more or less valid. Determining the degree to which scores ^are 

y 

valid is a time-consuming and complex process which, given these results, lies 
beyond the scope of the project. 

Student fatigue or overconf idence . It is possible that the student 
training implemented in El and E2 actually resulted in students becoming 
fatigued with taking tests or so overconfident in their ability to take tests 
that they^did not perform as well on th: actual standardized achievement test. 
Particularly because of the scheduling problems which resulted in the students 
receiving 4-5 reasonably long practice tests in the last two months, students 
in El and E2 may have become "desensitized" to the importance of the 
standardized achievement test and performed below their true level of 

5 

achievement. 0 

Suggestions F uture Researc h 

The results from this study do ncrt demonstrate that the use of these 
materials results in more valid standardized achievement test scores or better 
performance or attitudes on the part of students. According to the measures 
designed for this study, they do suggest that teachers who have participated 
in the project have better attitudes toward standardized tests and are more 
capable test administrators. However, because the results concerning student 
performance contradict the conclusions of previous research in related areas, 
and becau<^ of t achers' perceptions that the materials were beneficial, it is 
suggested that the resultssof this project not be taken as the final word. 
Further research shouldVfie conducted . In conducting that research, several 
suggestions are made based on, the data collected during this study. ^ 
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Practice tests .should be revised so that less time is spent giving 
directions and going over sample items. This could be accomplished 
by having each practice test include only one or two subtests. In 
add it i on, "it might bje worthwhile to create a one-to-one 

i . 

correspondence between the concepts taught in the films^rip and 
practiced in the practice test. 

Future studies should be designed so that the effects of 
reinforcement, teacher training, student training, and practice tests 
can be examined independently from each other. This type of design 
would make any results easier to understand. Of course, this type of 
design requires more students and costs more money (all other things 
being equal ) . 

The filmstrips and student training packages should be redesigned 
into smaller components, there should be no more than 15 to 20 
minutes per training session, and should incl ude additional practice 
on each of the concepts taught. Also, substantial time should be 
invested with smaller groups of students going through the filmstrips 
before testing them on a large population. Such a study should run 
over several years. One of the main problems with the current 
project was trying to do extensive curriculum development work while 
simultaneously conducting a large-scale .field study. 
The "reinforcement" procedures used in the study need to be 
completely re-examined and perhaps^jreconceptual ized. The impetus for 
this work was the work reported by Taylor and White (1982) in which 
students were paid money for their performance above that which would 
have been predicted from their pretest score on a standardized 

achievement test. That work should probably be replicated to first 

... r , ^ . - . . 
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determine whether motivation is as large a factor as it appeared in 
the Taylor and White study. If indeed it can be demonstrated that 
motivational fac j consideration in the valid , , of 

standardized test scores, then alternative ways of motivating 
students to perform on standardized tests should be found. The way. 
in which the present study was designed, it was impossible to 
separately estimate the effects of reinforcement to determine whether 
or not the procedure was actually reinforcing, 
5. The training materials should be tried with children at different 
grade levels. Second grade children were chosen for this study 
because we wanted to train students as near as possible to the 
beginning of their standardized testing experience before they had 
learned "bad habits" which would have to be unlearned. However, the 

c 

fact remains that the concepts taught may have been too complex for 
second graders or that the emphasis on test taking at such an early 
age may have made them more anxious,. x 
The most important collusion from these suggestions, however, is that 
further research is necessary to understand to what degree typically 
administered standardized achievement tests are valid and useful for the 
purposes for which they are usually used. The materials developed in this 
.project represent an important beginning. As they are used further and more 
dataware collected, we will be able to better understand the degree to which 
results from standardized tests should and can be used to make programming, 
evaluation, anc ,,, ;ut sions for primary grade children. 
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Factors Affecting 
Standardized Test Results 

Reinforcement 



Author(s) 



Abbreviated Title 



Code 



Item 

1. SUBJECTS, 

la. Number of Subjects 

lb. Mean Age 
1c. Mean IQ " 

2. INDEPENDENT, VARIABLE 
2a., Reinforcer 

2b. Schedule ( 

2c. Contingency 

3. • DEPENDENT VARIABLE 

3a. Type of Test 

3b. Administration Unit 

4. DESIGN 

4a. Type of Design 

4b. Quality of Design 

5. EFFECT SIZE , 
6: CONCLUSIONS . 



1 

2 

1 

2 

1 

2 



Description 



12 - 29 
30 - 100 



4 

7 



6 

10 



43 - 85 
86 - 100 



3 = over 100 
3 = 11 - 23 
3 = over 100 



1 = 

2 = 

3 = 

4 = 

1 = 

2 = 

3 = 

1 = 



1 = 

2 = 

1 = 



1 = 

2 = 
1 - 



1 

i 

3 



money r 5 = token/, 

candy 6 = choice 

praise 7 = prize 

reproof 3 

inmediate-item 
immedi ate-subtest 
delayed 

contingent 2 = noncontingent 



academic '.. - 
intelligence 

Individual 2 = group 



true experimental 
quasi =experifnental 

high 2 = low 



3 = pre/post 



treatment worked 
some question 
treatment did not work 
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Factors Affecting 
Standardized Test Results 
Student Training 



Author(s) 



Abbreviated Title 



Code 



Item 

1. SUBJECTS 

. la. Number of Subjects 

lb. Mean Age 

1c. Mean IQ 

2. INDEPENDENT VARIABLE 
Type of Training 

3. DEPENDENT VARIABLE^ 

3a. Type of Test 

3b. Administration Unit 

4. DESIGN 

4a. Type of Design 

4b. Quality of Design 



Description 



T = 9 - 49 

2 = 50 - 99 

3 = 100 - 199 

1 = 5 - 10 

2 = 11 - 14 
3=15- 18 

1 = 65 = 89 

2 = 90 - 114 



4 
5 

4 
5 



200 - 705 
over 1000 

19 - 24 
25 - 40 



3 = 115 - 120 



* 1 = practice * 2 = testwiseness 

1 = achievement 2 = IQ 
1 = individual 2 = group 



1 = true experimental 

2 = quasi -experimental 

3 = pre/post 

1 = high 2 = low 



5. EFFECT SIZE 

6. CONCLUSIONS 



1 = treatment worked 

2 = some question 

3 = treatment did not work 
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SUMMARY OF DATA FROM STUDIES ON 
REINFORCING TESTING BEHAVIOR 

1 v 



ID 


ES 


Quality 


IQ 


01 


72 


2 


• 

47 


' 6& ' 


2 


93 




92 


2 


43 


02 


35 


2 


106 


- 25 


2 


- 106 


03 


11 




100 


04 


- 20 


1 


119 


- 26 


1 


- 102 




269 


1 


80 




- 03 


1 


119 




15 


1 


96 


.. . 


23 


1 


78 




69 


1 


100 


14 


1 


100 


UD 


165 


I 


82 


u/ 


25 


2 


99 




08 


2 


100 


DQ 


87 


2 


102 


JLU 


95 




63 


11 


81 


1 


65 


-79 — 


• 


65 


1 ? 


45 




90 


23 




90 




54 ■ - 




90 




. . , „ 12 




90 




16 




90 




41 




90 




11 




100 




79 


* '1 


100 




16 


1 


100 


14 - *' 
IS 


23 




115 


12 




100 




38 




100 




160 




100, 


" 16 


06 




108 


06 




' 108 




06 1 




.108 


17 


29 


2 


114 






94- 




95 


2 






1,12 


2 . 


76 


18 


136 


. ■ •<' 2 


108 
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TRAINING STUDENTS IN TW 
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10 



ES 



Quality 



01 

02 
03 
04 
05 
06 
07 

08 
09 

10 
-11 
12 

13 
14 

15 

16 
17 



18 
-20 



14 
16 
78 
69 
39 
404 
197 
67 
49 
84 
233 

- 27 
20 
53 
32 
02 
08 
13 

- 08 
05 
05 
72 
54 
18 
12 

. 74 
97 
31 
43 
78 



21 






28 
13 


22 " 






13 
37 

•29 


23 - 






36 
15 


24 * 






3d 


25 




i 


03 


26 






06 


27 




1 


20 


28" 




i 

1 


119 

58 
82 
72 


29 




l ■ 


109 


30 


/ 




23 


31 






183 


32 


/ 

/ 




84 N 
73 
69 


33 






56 


34 






48 


.35 






142 








78 






\ 


138 


. / 






126 
138 








110 


37/ 






21 


/ 






24 






21 



1 

1 

2 
1 
2 
2 
2 
2 
2 
1 
2 
2 
1 
2 

2 

«* 

C\ 

2 
2 
2 
1 
1 
1 
2 
2 
2 
2 
2 
2 
2 
2 
1 
1 
2 
2 
2 
1 
1 
2 

1 • 

i 
l 

2 
2 
2 
2 
.1 
1 
2 
2 
2 
2 
2 
- 2 
2 

~~2~ 
' 2 
2 



. 2 
2 

/ 2 
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Appendix B 

Materials Related to Development of Films'crips 

1. Letter from Northwest Regional Educational 
Laboratory about Frequency with which Different 
Tests Are Used ±>y_Xitle I Projects i n Utah 

2. Information on Frequency with which Different 
Tests Have Been Adopted by States and Districts 

.3. Form Used in Analyzing Standardized Test for 
Developing Training Objectives 
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IMA Tit* I (valuation 




Regions 8,9, IP 



re . i juwl^ 



Technical 

fl/zi/lance 

Center/ 



November 16, 1981 



Ms. Cie Taylor \ 
Utah State University 
Logan, UT 

Dear Cie: 

To find an answer to your question on the tests most frequently used , I 
contacted David Kaskowitz of RMC Corporation, who is responsible for the 
national analysis of Title I data. The results of his preliminary analysis 
show project utilization, of tests in this order: 

' California Achievement Test 

SRA 

Metropolitan Achievement Test . 
Gates-MacGinitie 
Stanford Achievement Test 
iowa Tests of Basic Skills 

Dr -Kas-kowitz' stressed that the order might change with further analyses 
and could be quite different when the numbers of students within a project 
are taken into -account. Also, he mentioned that ^ uenc ^^ s ° c n J a ^ d e 
with the first four tests were similar, with a gap between them and the . 

last two. j 

» t 

Do phone agaih, should you have further questions. 
• Cordially, . 




Mary QuiT7ing 
Senior Research Associate 
Title I Evaluation 
Technical Assistance Center 



cc: • Kathy sJewart 



I Northwest Regional Educational Laboratory I 
[4©0 5.W. Sixth Avenue • Portland, Oregon 97204 • Telephone (5 



AN EQUAL OPPORTUNITY EMPLOYER 
lE RJC M^mmm ^^mM^. J n^,— 




■ c *fcthy '■>* ovi! 

best oty mim 



\ iC$ Sixth Avenue • Portland, O: 

an/equal opportunity employer 



M. I D CONTINENT --Region 



CAT Districts 

St. Louis 

Cincinnati 

Omaha, 

Detrpjit 

Minneapolis 

DeK^lb, IL 

Lincoln 

Columbus 



CTBS Districts 
Cleveland 



CAT States 



None 



CTBS States 



Wisconsin - 4,8,11 
Kentucky - 3; 5, 7, 10 



Other States 
Iowa - ITBS'3-8 



Other Districts 

Milwaukee _ 

Chicago — ' 

Kansas City 

Cleveland 

In d i an ap o lis — - — 

Chicago 

Arch Diocese 
Iowa 

Des Jfoines- 
Flinc 
Toledo 
Witchita 



ITBS 
ITBS 
ITBS 
MAT 

^ITBS -* 
ITBS 

ITBS 3-8 
-ITBS /MAT 
SRA 
ITBS , 
■ITBS 



DISTRICT TOTALS 



CAT 
CTBS 
Other 
ITBS 



9 
1 
3 
9 



STATE TOTALS 



CAT 

CTBS 

Other 



0 
2 
1 
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W E STERN Region 



CAT Districts 

Long Beach 
Fresno 
Bakersf ield 
Santa Ana ' 
Seattle 
Spokane 

Salt Lake City 

Phoenix 

Tucson 

Clark Co. , NV 
Pasadena 



CTBS Districts 



Los Angeles 
Sacramento 
San Diego 
San Francisco 
Garden Grove 
San Jose 
Albuquerque 
Denver 

Jefferson Co . 



CO 



Los Angeles Arch Diocese 

Oakland' 

Tacoma 



Other Districts 



Clark Co. , NV 

..Granite., UT 

San Juan, CA 
Portland * 



MAT 
SAT 
ITBS 

Own Test 



CAT States 

Washington - 4 
Arizona - 1-12 



CTBS States 



Utah - 4-8 Sample ■ 
New Mexico - 5,8,11 



Other States 

Hawaii - SAT 2 , 4 , 6 , 8 , 10 



DISTRICT TOTALS 



CAT. 

CTBS 

Other 



11 
12 
4 



STATE TOTALS 



CAT 

CTBS 

Other 



2 
2 
1 



9 
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SOUTHEK.N Region 
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CAT Districts 

Memphis 

Oklahoma City 

Charlotte-Mecklenberg, NC 

Akron 

Atlanta 

Birmingham 

Corpus . Chris ti 

Mobile 

Caddc Parish 

El Paso 



CAT States 

Alabama 1-12 
Mississippi - 4,6,8 
North Carolina - 3,6,9 
Texas - 6 Sample 
Oklahoma - 6-9 Sample 



' CTBS States 

South Carolina.- 4,7,10 



CTBS Districts 

Jefferson Co . , KY 
Jefferson Parish, LA 
Broward Col, FL 
Brevard Co . , FL 
Hillsborough Co., FL 
Orange Col, FL 
Tallahassee 
New Orleans 
Charleston, SC 
Nashville 



Other States 
None 



DISTRICT TOTALS 



Other Districts 

Dallas ITBS 
Houston • ' ITBS 

Dade Col, FL SAT 

Miami Diocese SRA 

Orange Diocese SRA 
Tallahassee 

O. Diocese $RA 

Tampa Diocese SRA 
Jacksonville 

Diocese t SRA 

•■ New Orleans 

Diocese SRA 

Jacksonville SAT 

Tulsa SRA 

Pinellas Co . , FL SRA 

Fort Worth ITBS 

. Palm Beach SAT 



CAT 

CTBS 

ITBS 

SRA 

SAT 



10 
10 

3 

8' 

3 



STATE TOTALS 

CAT 5 
CTBS 1 
Other 0 
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EASTERN 



R egion 



CAT Districts 

New York City 
Pittsburg 

Pittsburgh Diocese 
New Castle Co<; , DE 
Philadelphia 
..Baltimore 
Montgomery Co , , MD 
Prince George Co. , MD 
Jersey City 
Kanawha Co . 



CAT States 

Delaware - 1-8,11 
Maryland - 3-5-8 



CTBS States 



West Virginia - 3-6-9 



CTBS Districts 

Washington, D. C. 
Newark, NJ 



Other States 

Virginia- SRA 

Rhode Island - ITBS 4,8 



Other Districts 

New' York Arch Diocese SRA 

Newark Diocese SRA 

Brooklyn Diocese SRA 

Norfolk SRA 

Richmond SRA 

Rochester -MAT 



DISTRICT TOTALS 



CAT 
CTBS 

SRA 
MAT 



10 
2 

5 
1 



STATE TOTALS 



CAT 

CTBS 

Other 



2 
1 
2 
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GRAND TOTALS 



DISTRICTS STATES 

CAT 39 CAT 9 

CTBS . 26 .. CTBS 6_ 

65 15 



MAT -SAT 7 -ITBS 2 

ITBS 13 SAT 1 

SRA 1A SRA 1_ 

Own Test 1 a 

35 
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"TEST ' ■ LEVEL ' SUBTEST. . * REVIEWER 



Fcr 1 - 4 indicate if the words are written on the test booklet, otherwise, 
oral will be assumed to be the mode. 



I. Difficult vocabulary from directions (individual words),, , " 



II. ., Difficult directions (phrases). 



III. Series of directions (in steps). >• £io?U& , 

V 7Un ■^'"■ s - ! i .. -r. ^/-l .ok, p:vyo a,ii.,;-" t ; v vv ^ 0 p 



IV. New symbols. 

O 



V. Examples of different response formats (from test booklet). 



Oolui 



; o . — 
) 0 - 



D - 

o 



best copy mmm 
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Appendix C 



1. Number of. Minutes and Items for Each Practice Test, 
in Participating Districts 

2. Reading Series Used in Participating Districts 
Upon Which Practice Tests Were Based 

3., Strategies Used to Construct Di stractors "f or 
Practice Tes,ts [ < 

4. Practice Test Directions for Experimental Group I 
for test #5 
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Number of Items and Minutes for Each Subtest of the 
Seven CTBS Practice Tests Used in Cache School .District 



r- — ■ — - — ■ — ^ 

I 










PRACTICE 


TESTS- 












Subtest 


1 


* 

2 


3 


4 


5 


6 , 


7 : 


CTBS 


Items 


Time 


Items 


Time 


Items 


Time 


Items, 


Time 


Items 


Time 


Items, 


Time 


Items 


Time 


Reading Vocabulary 


12 


5.52 


12 


5.52' 


. 9 


4.14 


9 


4.14 


12 


5.52 










Reading Comprehension 
Sentences 




# 


6 


5.22 


6 


5.22 


,6 


5.22 


8 


6.96 










Reading Comprehension 
Paragraphs ■ 










4 


4.67 


4 


4.67 


7 


8.17 










CTBS (1981) : 






























Word Analysis 

A. Consonant Sounds 

B. Vowel Sounds 
. (Auditory) 

C. Vowel Sounds 
(Visual) • 

D. Word Identi- 
fication 

l. iyi i 3D les „ 
F. Root Words 
L Compound Words 






















3 

2 

6 

2 

2 
2 
2 


2.25 
1.5 

4.5 

1.5 

1.5 
1.5 
1.5 


3 
2 

6 

2 

2 
2 
2 


2.25 
1.5 

4.5 

1,5 

1.5 
1.5 
1.5 


Vocabulary 
m. synonyms 
B. Sentence Com- 
pletion , 






















4 
2 


3.0 
1.5 


4 
2 


3.0 
1.5 


Reading Comprehension 


i 








Li 












■ 8 


8.8 


8 


8.8 


Totals , , 


1 ,12 


5.52 


18 


10.74 


19 


14.03 


19 


14.03 


27, 


20.65 


33 


27.55 


33 


27.55 

1 
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Number. of Items and Minutes for-Each^Subtest of the 
Seven ITBS Practice Tests. Used in Nebo School District. 



1 

Subtest 


PRACT 


: C E TESTS 


1 


■2- 


3' 


. 4 


: 5 


• 6*. * 


7* 




Items Time 


Items Time 


Items Time 


Items Time 


Item 


s Time 


Item 


s Time 


Items time 


VCB 1. Picture Identification 


Q 


4.5 


D 














i 


.6 


■ 3.0 


r 

r 0 


J.O 


VCB 2. Definition . 






0 
0 


4.U 


c 

,0 


o.U 










4 


2.0 


4 


2.0 


WA 1, Initial Sound (Picture) 






4 

i 


l.c 


a 


1 9 

l»c 










2 


.6 


0/ 

a 




WA 2. Initial S.und (Word) 






4 


lit 


4 


1 9 
l.c 










2 


.6 




- * • 


WA 3. Final Sound (Picture) 










A 

4 


1 9 

I.C., 




19' 
l.c 






2 


.6 






WA 4. Final Sound (Word) 












f 1 9 
f l.c 


A 
4 


1 .£ 






2 


.6 


0 

c 


c 

• 0 


WA 5. Sound Substitution 








- 


4 


1 9 
l.c 


A 
4 


1 .c 






4 


1.2 


/I 
4 


1 0 
l.c . 


WA 6. Silent Letters 














4 




4 


1,2 


2 




0 
C 


c 

.0 


WA 7. Middle Consonants 














/I 


1 ? 
l.£ 


4 


1.2 


2 


.6 




v .6 


WA 8. Vowel Sounds 














„ 4 


1 9 
l.c 


4 


1.2- 


4 


1.2 




1 0 


WA, 9. Long/Short Vowels 














4 


1.2 


r 


r.z 


4 


1.2 


4 


1.2 ' 


WA 10. Endings 


















4 


1.2 


2 


.6 


2 


.6 


WA 11. Compound Words 


















4 


;i.2 


2 


.6 


2 


.6 


Picture Description 










6 


3.0 


6 


3.0 


9 


4.5 


12 


6.0 


12 


6.0 


Sentence Understanding 
















1 fi 


8 


3.2 


10 


4.0 


in 
iu 


4. II 


Stories 




J 












L 


8 


4.0 


14 

a 


7.0 


14 

r 


7.0 


TOTALS 


9 


4,5 


22 


9.4 


32 


12.0 


38 


13.0 


49 


18.9 


1 

74 


30.4 


74 


30.4 


LIMIT 




5 




10 




15 




,15 




20 




30 




30 



*Use sample items only as indicated in the test; 
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. Number of Items and 'Minutes for Each Subtest of the • 
, Seven SAT Practice TeSts Used in Granite School District 



Subtest 


PRACTICE' TESTS - 


1 ^ 


2 


\ 3 


4 • 


5 


6. . 


7 ' 


K 

Vocabulary 
Reading A 

[Picture Identification) 

- .. 

Reading B 

.(Sentence Completion) 

Word Study Skills A 
(Word Identification) 

nuru jLiioy ok i ii b d 

(Sound Discrimination) 


Item 


s Time 


Item 


s Time 


Item 


s Time 


Item 


s Time 


Item 


s Time 


Item 


s Time 


Item 


s Time 


9 


» 

4.86 


7 

12 


3.78 
5.40 


12 
10 
13 


5.40 
5. "20 
4.29 


10 
13 

"l2 


5:20 
. 4.29 
5.16 


7 

12 
8 
8 


3.78 
5.40 

,4.16 

2.64 

•5.16 


13 
12'' 

13 

13 

■12 


7.02 
5.4Q 

6.76 

4.29 

5.16 


13 
12 

13 

13 

12 


.7.02 
5.40 

6.76 

4.29 

t 

5.16 


TOTALS 


'9 


4.86 


19 


9.18 


35 


14.89 


35 


14'. 65 


47 


21.14 


63 


28.63 


63 


< 

28.63 



Number of Items and Minutes for Each Subtest of the 
Seven HAT Practice Tests Used in Logan School District 



j 

Subtest, 




PRACTICE TESTS 


1 


2 


3 


4 


5 


6 


7 


9 

Word Knowledge A 
(Picture Identification) 

Word Knowledge B 
(Definition) 

Word Analysis • 

heading Sentences 

Reading Stories 


Item 


s Time 


Item 


s Time 


Item 


s Time 


Item 


s Time 


Item 


s Time' 


Item 


s Time 


Item 


s Time 


15 


5.25 


15 
12 


5.25 
6.24 


12 

\ 
1 


6.24 

5.16 

3,78 


12 
7 
7 


5.16 
3.78 
5.18 


6 

8 

12 
4 
9 


2.10 

4.16 

5.16 
2.16 
6.66. 


15 

12 

12 
7 

14 


5.25 

' 6.24 

5.16 
3.78 
10.36 


15 

.12 

12 
7 

14 


5.25 

6.24 

5.16 
3.78 
10.36 


TOTALS 


15 


5.25 


27 


11.49 


31 


15.18 


26 


14.12 


39 


20.24 


60 


30.79 


60 


30.79 



EXPERIMENTAL GROUP I 
CLASSROOM TEXT INFORMATION 



Teacher 



Series 



Level 



Title 



V. Jenkins 



L. Murray 



C. Nielsen 



P. Jensen 



P. Kane 



G. Kunz 



E. Archer 



A. Norris 



Holt 



Holt 



Holt 



Oistar 



Houghton-Mifflin 

Distar 
Houghton-Mifflin 



Distar 



Houghton-Miffl in 
S. Waldram Distar 



Houghton-Mifflin 
Ginn 



Ginn 



B. J. Crockett Distar 



Ginn 



7 
9 

7 
9 

7 
9 

I 
I 
6 

I 
I 
6 



I 
6 

:I 
:I 



3,5,6 



5,6,7 

II 
II 
6 



A Place For Me 
People Need People 

A Place For Me 
People Need People 

A Place For Me 
People Need People 

Fast Cycle 
Book B 
Secrets 

Fast Cycle 
Book B 
Secrets 

Fast Cycle 
Book C 

Secrets ' * 

Fast Cycle 
Book B 
Secrets 

A Duck Is A Duck 
May I Come In , 
One to Grow On 

The Dog Next Door' 

' ■ a 
Fast Cycle " 

Book C 

One to Grow On 
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District School 
Granite Redwood 



Nebo 



Westside 



Teacher 


Series 


Level 


Title 

1 1 W | W 


V. Latham 


Di star 


II 


Book A 






II 


Book C 


? 


Ginn 


6 


One to Grow On 

will* WW Ul vM V I I 


E. Banks 


Distar 


I 


Book C 






II 


Book A 




Ginn 


7 


The Don Next Door 

1 1 1 W U W ^4 1 1^ A v \J W \J 1 


C. Borden 


Distar 


I 


Book B 






11 


Book A 




.Ginn 


6 


One to Grow On 


V. Gomez 


Distar 


I 


Book A 






II 


Book A 




Ginn 


7 


The Do 1 ! Next. Door 

1 IIW W v» 4 l>vAw< 1/vUIl 


S. Green 


Ginn 


7 


The Doq Next Door 






8 


How It Is Nowadays 


i iii 

L Lobb 


Distar 


II 


Book A . 




Ginn 


8 


How U Is Nowadays 


F. Martin 


Distar 


' I 


• Book C 






, II c"'." 


Book B 




Ginn 


7 

r 


The D"'M Next Door 


M., Anthony 


harcourt Brace 


" 5 


0 

. Together We Go' 




Jovanovich 


6 


A World of Surprises 


M. Willis ' 


Harcourt Brace 


5 - 


Together We Go 




Jovanovich 


6 


A Wor ' of Surprises 


ft n i • j 

A. Burbidge 


ii i_ n 

Harcourt Brace 


*- 

5 


Together We Go 




Jovanovich 


■ 6 


A Wor;.1 of Surprises 






.. 7 . 


People and Places 


M. Payne 


Harcourt Brace . 


5 


Together We Go 




T • 1 

Jovanovich 


6 . 


All 1 i P A* • 

A Worid of Surprises 






7 


People and Places 
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EXPERIMENTAL GROUP II 
CLASSROOM TEXT INFORMATION 



<3 



District 


School 


Teacher 


Series 


Level 


Title 


iscne 


rdrK 


l . laggari 


no It 


7 
/ 

9 


m r i ace rur i*ie 
People Need People 






L. Talbot 

f 

j 


Holt 


9 

11 


People Need People 
? 




Lewi.ston 


P. Mieure 


McMillan 
Holt 


7A 
9 


On Wings of Words 
People Need People 




• 


M. Schenever 


McMillan 


6 
7 

7A 


Worlds of Wonder 
Lands of Pleasure 
On Wings of Words 


Granite 


South Kearns 


M. Franco • 


Distar 
Ginn 


II 
7 
8 


Book A 

The Dog Next Door 
How It Is Nowadays 






G. Madsen 


Distar 
Ginn 


II 

7 
8 


Book A 

The Dog Next Door 
How It Is Nowadays 






E. Zagarella 


Distar 
Ginn 


II 
II 

8 


Book A 
Book B 

How It Is Nowadays 




Stansbury 


B. Hunt „ 


Distar 
Houghton-Mifflin 


II . 
6 
7 


Fast Cycle 

Secrets 

Rewards 






M.' Miller • 


Houghton-Mifflin 


4 
7 

o 

0 


Rainbows 
r ' Rewards 
panorama 






L. Sorensen 


• Distar 
Houghton-Mifflin 


II 

. 1 

8 


Book B 

Rewards 

Panorama 



ERIC 



District School 



Teacher 



Series 



Level 



Title 



Granite Stansbury 0. Wallace Distar 



Western 
Hills 



Nebo 



Goshen 



Wilson 



!. Cannon 



J. Eber 



" M. Turner 



R. Boyack 



L. Neff 



Houghton-Mif f 1 in 

Distar 
Houghton-Mifflin 

Distar- 



Houghton-Mifflin 
J. Schmidt Distar 



Houghton-Mifflin 
D. Tanner Distar 



Houghton-Mifflin 

Distar ■, 

Houghton-Mifflin 

Lynn & Bacon 
Scott Foresman 
Scott Foresman 

Lynn & Bacon 
Scott Foresman 
Scott Foresman ' , 



D. Altenberg Houghton-Mifflin 
Houghton-Mifflin 
Houghton-Mifflin 

M. Anderson Houghton-Miffljn 
Houghton-Mifflin 
Houghton-Miffljn 



Specia 



\ Specia 



Fast Cycle 
Book A 
Rewards 

Fast .Cycle 
Book C 
Spinners 

Fast Cycle 
Book B 

Spinners/Towers/Skylights 

Fast Cycle 
Book D 



(2.5) 2,5 (?) 

Fast Cycle 
Book C 
Towers 

Fast Cycle 
Book C 
Towers „ 

Primer At Home and. Away 
0 Calico Capers, 

Daisy Days 

Primer At, Home and Away 
0 Calico Capers 

Daisy Days 

Honeycomb 
Clover Leaf 
Sunburst 

* Honeycomb 
Clover Leaf 
Sunburst 



c 

t 

SAT 

TEACHER'S KEY 
• PRACTICE TEST # 5 , 



Vocabulary 


Reading A 


Reading B ■ 


Word Study A 


* • 

Word Stud) 




1. b 


4 

L 




i' 


i. b 




a 


1. ^ . 


■L . b ' 


2. a 


2. 


b 




2 ~ 

L • 


3. b 


3. 


3. 


G ' 




3 ~ 




4. 


4. 


b 


4. j>J 


4.31 


5. c 


5. a ; 


5/ 


d 




5 a 


6. a , 


.. 6.. b 


f.. 


a 


6. 


" 6 ~ 


7. 


7, b 


7. 




7. 




8. 


8. ^ b 


8. 




8. 


' 8 

o. 


9. 


' 9. c 


___ 




,0 

'J 4 


a ~a~ 


10. 


10. 


10. 




10. 


10. b 


11. 


.11. 


11. 




11. . 




12. 


, 12. 


, 1?. 




12'. 


12. 


13. 




■ ' 13. 




13. 
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SAT 
VOCABULARY 



SAY GROUP RESPONSE 



T 

1 


Demonstrate. 
Check fingers. 


rum to page 1. 

Put your finger on page number 1. This is a 
Vocabulary test. This vocabulary test will show 
how many- words you know. 


What is this? 

What will this vocabulary 

test show? 


2 




■ Put your finger on the sample 
girl , boy , boot . 


Read, the words out loud 
with me. 


3 


V 


first, I will read part 1 of a sentence.. Then I'll 
read three_ words. 

You will have to find which of the three words 
completes the sentence. 


What will I read first? 
Then what will I read? 


4 


1 


Let's try the sample. Listen to the sentence. 
A young man is a 


Try each 'word to find which 
word completes the sentence 
Try qirl . 

A vouno man is a , qirl • 
Is it right? 

. Iry bov ' 
A yuuny man is a buy 

is it right? 
' j r y boot _ < 

A vOung man is a » boot * 
Is it right? 


girl boy , boot . 

t 


5 




Which word completes the sentence? 
' Yes, boy . , You can see the space under boy- 
has been marked. 




6 




■We will do all the items on this page the same way. 




7 


Check fingers. 

0 


finger on item number 1. 

Listen for the word that best completes the 

sentence. 


Read the words to yourself. 



SJ 

320 



•EMC J 



DO 



Make sure all stu- 
dents made a mark. 



Wait 10 seconds 
between items! 
Repeat 18 for these 
items. Say each' 
sentence twice, 



SAT 

VOCABULARY (continued) 
SAY 



GROUP RESPONSE 



1." 



You live at 



You 1 i vp at. 



school dinner 'home 
t i __■ 

■Srhool' .dinner Jw- 1 



2_ A type of fruit is an 
3 - A sh'qrt sleep .is a 

4. ke get wool from- 

5- A smile j s a 

' 6. S even days make a 

7. 

8. ~ 

9. 

10. > 

11. ~ 

12. ~ 
13., 



ape 



sheep 



snarl 



week 



apple acorn 



nap 

"sleep 



weed 



snap 



fur 



weak 



Mark the answer space. 
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SAT 
READING A 



DO 



Check fingers. 



Check fingers. 



Check fingers. 



Check marks. 



Check fingers. 
Check marks. 



SAY 



Turn to page 2 



Put your finger on page number \ This is a 
Reading test. 



This reading test asks you to find words that go 
" :tL 'the picture! ' ' ~ ~~ ~" 



Look at the sample and put your finger on the 
picture of hut, ■ There are three lines under 
the picture. There is one right answer in each 
line and there are three words to choose from. 



Finger on A. ' ' 

hut , hum , room ■ 

Which, word tells about the picture? ' 

Yes, see the space under hut has been marked. 



Finger on B. 

J oft , house', hound . 
Which word tells about the picture? 
yes, mark the space under the word house 



Finger on C. 
fix , hoe 



home 



How mark the space under the word that tells about 
the picture. You should have marked under home . 



Now you will do the rest of the items on this page 
just like the sample. When you get finished, go 
back and check your work. 



GROUP RESPONSE 



What is this? 



Ihis reading test asks you 
to find words thafgo with 



How many words in each line? 
How many right answers in 
each line? 



Read the words with me. 



Read the words with me. 



Read the words with me.' 



0> 



What do you do when you're 10 
finished? 



£»3 
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00 



\ SAT 
READING A (continued) 

SAY 



Record time. 
Start : 



Time Tf 
.Stop : 



GROUP RESPONSE 



Finger on. item number 1. Go. 



Stop. 
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SAT 
READING B 



1 


Check fingers. 


Turn to page J . Put your finger on page number r 
This is a Reading test. You will read sentences " 

^ and ctnHoc 

Then you will answer questions about the sentences 
and stories. 


, GROUP RESPONSE 
What will you read? 


4- 


- Gheek fingersr - 


Put your rinjger on the sample. 

" -Wesaw-a-bnght-coTored-bM-at-theTOo; " 

• 1 he color of the bird might be — 


Read thejejuences with me. 


3 

— 


Check fingers. 


Mow you will pick a word to finish th* cpnt-onror " 
j " "•■ | fvi> u itui v Milieu iii^ ijciiicnceSf 

Finger on A. 

red nalp «m« ^nn 
ICU > P d|e i gray , dull . 


Read the words. 


4 




wnicn worub leu uit! cuiur ot the bird . / 




5 




ies, me woro reu has been marked. 




6 


Check marks. 


[he next sentence reads ... 
The bird was in the 


Read the sentence with me. " 


Now, we will read the, words beside B. 

SOUD hat 7fifl animal 

Winch word tells where, the bird was ? 

Hark the word you think finishes the sentence. ; 

iuu snou io nave marked zoo. 


7 




flow you will do the rest by yourself. When you " 
get finished, go back and check yo|r work. 


What do you do when you're 
finished? " 


8 


Record time. 
Start 

Time 4 :>00 
Stop : 


Finger on item number 1. Go. 




9 




STOP. : ■ 
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SAT > 
WORD ST UDY SKILLS - A 

. — _ _. — 



00 ' , 1 SAY , GROUP RESPONSE 



1 


Check fingers. 


- Turn to page Put your finger on page number^, 
for this test you will find a word that I say. 




2 


Check fingers. 


Put your finger on the sample, line A. 
paste , past , patch . > 
"Find the word paste . See the space below 
paste has been marked. 


Read the words out loud 
with me. 


3 


Check fingers. 
Check marks. 


Finger on line B. 

sick v ' , six , sill , 

Mark the word six , six 

Is six the first, middle, or last word? 

Yes, the middle WO rd. 


Read the words with me. 


4 


Check fingers. 
Check marks. 


Finger on line C, 

pill .i ' peq » piq . 

Hark the word pig pig" 

You should have marked the last word. 


Read the words with me« 

< 

• 


5 




Now we will do all of the Hems on this page 
together. Remember to mark only the word i say. " 




6 


Check fingers. 


Finger 'Oil item 1,< 
Hark, wept , wept , 


. Read the words to yourself. 

0 


7. 


Wait 10 seconds * 
between items. 


Item 2, Mark reach , reach . 

« 




8 


Repeat #7 for these 
items. Say each 
word twice. 


"~3, wheel 7, ll. 

4. -little 8. 12. 

5. groat 9, ' 13. 

6. ~ 10, " " M. 


r 

s 

r 
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DO 



Check fingers. 



Check fingers. 



Point to 3 words. 



Record time. 
Start : 
Time 5 -00 
Stop : 



SAT 

■ STUDY SKILLS, - 
.SAY 



Turn to page 5_. Put your finger on page number 5, 
This is a test on the sounds that letters make. " 



Put your finger on the sample"; * 
Read .tjie first word to yourself. 

has a line under it. 

Listen for the ^ound of the underlined letter(s). 



Now you will find the underlined sound in one of 

these three words. 

Read the three words to yourself. 



Which' word has the underlined sound? 
■ Is ifc _ city , pi , or rail ? 



Yes, city. t The space i|nder city has been 
marked ' 



GROUP RESPONSE 



frnw read it out loud. 

W'-it letter(s) is underlined? 

M sound is underlined? 



Nov/ read the words out loud. 



Say the underlined sound 
again, 

Remember to listen' for the 
J" v 'er1ined sound, not letter. 



S9J you will do the rest by yourself. When you 
grf finished, go back and check your work. 



Finger on item number 1. 'Go. 



STOP. 



it do you do when you're 

.f : ;.ished? 



M 

CO 



0 
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Vocabulary 



I girl boy boot 

SAMPLE: aO b© - cO 



1 


school ' 

a o ' 


dinner 
bO 


home 


2 


ape 
aO 


apple 

b ° 


acorn 
cO 


3 


nut 
aO 


nap 
bO 


snap 
cO 


4 


sheep 
aO 


sleep 
bO 


7 fur 
cO 


5 


snarl 


grim 
bO 


grin 
cO 


6 


.week 
aO 


weed 
bO 


weak 
cO 



1 



1 - 
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8.A.T.-5-LMH 



TEST 2. 

heading: Part A 




TEST 2. 

Reading: Part B 



SAMPLE: 1 

We saw*a bright colored bird at the zoo. 
The color of the bird might be 



red pale gray 
a® bO cO. 
The bird was in the 



B. 



soup 
aO 



hat 

10- 



zoo 
cQ 



dj/Tl 
dQ 

animal 

dp 



He made a statue from a tree. 
The statue was made of 



1. 



wood 



metal 

bQ 



glass 
cO 



clay. 

dp' 



She put the basket on her head. 
She wanted to wear a 



2. 



dish 



hat 
bO 



dress 

c O 



shoe. 
dO 



Mary tries very hard, 
does her 



3. 



fast sleep 
aO bO 
to do a good 

fight ' job 



She always 
bean 



4. ; 



bO 



play 
cQ • 



Harry likes birds. He built a 
river truck ' cabin 



S ' aCT bO 
so he could keep a 



6. 



pigeon 

jO 



mouse 
bQ 



dog 



best 
quick. 

dQ ■ 



cage 
dO 



goat. 
dQ 



Word Study Skills: Part A 



r 



SAMPUE: 





paste 


• past 


patch 


a. 


a® 




cO 




sick 


six 


sill 


b 


a° , 


b° . 


< c o 




pill 


■ peg 


pig 


c 


aQ 


. b° 


-.c° 



. 1 


wept 

a ° - 


wait 


weep 

c 0 


2 


read • 
a ° 


reach 

b° 


real 
c° 


3 


*when 
a° 


wheat 

b° .. 


wheel 
c° 


•4 


litter 

a o 


-little 

b° 


* liter 
c° 


5 


gloat ^> 
a° 


goat ' 

b° , 


glow 
c° 
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TEST 3. " 
Word Study Skills: Part B 



278 



'•AMPLE: 
grass 



S.A.T.-5-L 

ERIC 



city 



goal 
bO 



rail 
cO 



T 


home 


house 
aO 


coal 

b o 


keep 


2 


face 


ai r 
aO 


craft 


ran 
cO 


3 


paper 


wait 

a ° 


ant 

b° 


po c§ 

c 




trumpet 


out 
aO 


trick 

b° 


:s 


5 


; crust 


rake 
aO 


city 
bO* 


boil 
cO 


6 


raft \ 


ram 
a° 


flip 

,o 


flat 
c° 


7 


greet 


: sleet 

a o 

a 


tip 
b° 


S gw 

c 


8 


sloppy 


go 


hot 
b° 


slip 
c° 


9. 


grouch 


witch 

a ° 


truck 

b° 


dish 
O 

c 


10 


vjne 


veil 


si ide 

*° 


foil 
O 

c 


11 


glue 


200 


up 


build 
c° 
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Test Item Construction Strategies 



Vocabulary (Word Knowledge) Distractors. 

A. Picture Identification (match word with picture--nouns arid verbs) 

1. Initial Sounds (flower-flame, cup-cut, whisper-fthistle-whisker) 

2. Final Sounds (tear-near, goat-boat) 

3. Word Appearance (bitter-butter-batter, number-notice, ■ captain-capture) 

4. Similar Sounds (sheep-sleep-geese, rug-rag, six-sick) 

5. Similar Definition (hood-mask-helmet) 

6. Similar Spelling (rocker-rocket, broad-board) 

7 ( . Related Words (pink-flower, cage-keep, drown-drift) 

B. Simple Definition (match word with short definition—sometimes opposites) 

1. Similar Forms (only-once, quiet-quit, delay-depart, confess-conf use) 

2. Opposites (going-coming, alert-asleep, remain-leave) 

3. Related Parts (stem-root-core, flock-nest) 

4. Related Family of Words (grass-tree, dog-bird) . 

5. Incorrect Logic (invited means liked, adult means healthy) 

6. Similar Sounds (light-bright, might-right) 

Word Analysis Distractors 

A. S i mi Var " Appe ar an ce" (~1 ove - li ver~pe^r-=pe al -peat-) — — - 

B. Reversal (evil-live) 

C. Similar Sounds (stuffy-fluffy-puffy) 

D. Prefixes (upset-inset-reset) 

E. Spellings (weigh-way, sear-seer, whether-weather, leaf-leave) 
Reading Comprehension 

A. Sentences 

1. Visual Discrimination (involving pictures of sentences) 

2. Understanding (action of noun in sentence) 

3. Common Sense (Fan is used- to make air warmer?) 

4. - General Knowledge (Moon means it is night.) 

5. Vocabulary (bubble, elephant, corner, flame) 

6. Logic (double the amount is twice as much) 

7. Inference (how would you feel) 

8. Relationships (brother, sister) 

B. Stories . 

. 1. Summary (title of story) ' n 

2. Sorting out details (who, what, where, when) 

3. Inference (How did Sue feel?) 

4. Common Sense (Do you get. wet when bathing?) 
.5. General Knowledge (Is Sunday, before Monday?) 

6. Judgement (Are Giants good or bad?) 

7. Vocabulary (What word in the story means ?) 

8. Conclusions 
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Appendix D 

Materials Related to Reinforcement Procedures 

1 . ~ Sampl e Chart ^L)se3~in"Re i rif or cement" Component 

2. Example of a Completed Chart Used in Reinforc 
ment Component 
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Note: This chart has been reduced for binding purposes. The actual chart 
measures 8-1/2" X 14". 
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Appendix E 

Materials Related to Sample Selection and Description 



1. Sample Letter Sent to Principals to be Used to 
Inform Teachers About the Project 

2. Letter Sent to Inform Teachers of Assignment to 
Experimental Group I 
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SAMPLE LETTER TO BE SENT ^PRINCIPALS 



The Utah State Office of Education has been awarded a contract from the 
U.S. Department of Education to develop,' implement, and evaluate the 
effectiveness of a project to improve the quality of data obtained' from 
standardized achievement tests. I have reviewed the description of the 
project carefully and am convinced that our district would gain much by 
participating. Most of the work for the project will be carried .out by 
researchers at Utah State University under the direction of Dr. Karl White. 

., •- i.\ : tudy is to i:iv ate the. effects on standardized 
test • ■ -.f ! ■ -'oV.-:-.:fMg v-vV^: 

1- » r a i ; . i ! : i,mji;<„:,;s in Lost"- 1. -ik skills, 

2. Reinforcing students for trying their best on standardized tests. 

3. Familiarizing students with the format of the particular 
standardized test used in their district. 

A related project was conducted during the last two years by the State Office 
of Education in conjunction with the researchers at LISU. The results of this 
previous project indicated that the above variables ha.ve substantial effect 
on the results of standardized test performance of elementary school 
children. The current project will focus on standardized reading achievement 
tests for second graders. The findings of the .previous project will be used' 
to develop and evaluate a number of training pack ag.es and procedures. If the 
project is successful, we will be able to be more confident that the results 
of our standardized tests are an accurate reflection of what students do or 
do not know. 

Our district has agreed to participate in the research and has suggested 
that your school (among others) be involved. Second grade teachers from each 
of the participating schools will be asked to participate. Once it is 
determined which teachers are willing to participate, the research procedures 

require that they be randomly assigned to either an exper imental-or-contrdl 

group. Those teachers assigned to the experimental group will be given 
training in appropriat e test administration t echniq ues and will be^ trained in 
how to assist in teaching their students appropriate test-taking and 
motivational techniques. Those assigned to the control group will receive no 
training. Data will be collected from all classrooms, but this will require 
almost no time from the teacher. 

Experimental group teachers will need to attend two workshops, one in 
early September, the other in early spring, to acquaint them with the 
research rationale and procedures. Since the first workshop will last a 
whole day and will be held on a Saturday, teachers will be paid an honorarium 
for attending. The research staff from Utah State University will work 
directly with the teachers in the experimental group to assist them in 
implementing the project. In addition, the research team members will be in 
monthly contact by telephone to offer any other assistance the teachers may 
find helpful- 
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Sample Principal Letter 
Page 2 
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I think this project will provide valuable training to our students and 
teachers regarding standardized achievement test administration. Further- 
more, the study is important for developing methods which will increase the 
validity of achievement test scores and provide a more accurate reflection of 
what students do or do not know. Therefore, I encourage you to support and 
participate in the project to the extent necessary. 

Enclosed in this packet are letters to be sent to the following second 
grade teachers in your school: . 

1) 

3) zzzzzzz^^zzzz^z: 

4 ) ZIZZZZZIIZZZZZIZZI 

5 ) ■ 

If you agree with me that our district should participate in tfiis study, 
please sign each letter and forward them as soon as possible to each of the 
teachers. Members of the project team will then be contacting each of these 
teachers by phone to determine which ones are able and willing to participate 
(I anticipate that a few teachers in the district will have legitimate 
reasons why they can't participate, but hope that there will not be many). 
Once we determine which teachers are able to participate, they will be 
randomly assigned to one of the experimental or control groups and the 
project will proceed. 

I would like to thank you in advance for whatever time and attention you 
are able to devote to this research. If you have any questions or for some 
reason think it would be better if your school did not participate, please 
con tac t me as soon as possible. 

— Sincerely, 
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UTAH STATE U N I V E R S I T Y • L 0 G A N . U T A H 8 4 3 2 2 



801-750-1981 



UNIVERSITY AFFILIATED 
EXCEPTIONAL CHILD CENTER 
UMC 66 



September 8, 1981 



Salt Lake City, UT 84106 
Dear Ms. 

I am writing concerning the research project being conducted by the Utah State 
Office of Education in conjunction with Utah State University. As explained to 
you on the phone, it was necessary to randomly assign those teachers who were 
willing to participate in the project to various experimental and control groups 
in order to investigate the. effects on standardized test performance of training 
students in- test-wiseness skills and reinforcing students for trying their best. 
In consultation with your district staff, your school was assigned to the 
experimental group which will be implementing procedures for student training, 
reinforcement, and teacher training in test administration. 

A workshop will be held on September 12 starting at 9:00 a.m. at The Sirloin 
Stockade Restaurant located at 972 East 7200 South in Salt Lake City. Since the 
workshop will take place on Saturday, you will be paid an honorarium of $50 for 
attending. Lunch will be provided and you should pl-an on being finished by about 
4:00 p.m. An agenda for the workshop is enclosed. 

To help us in getting the project off to a good start, there -are a number of 
things you need, to bring to the workshop. These are listed below: 

1. Reading Series Materials . As a part of the project, we will be preparing 
practice tests for youto give to your students during the year. These 
practice tests will be based on the Reading Series you are using in your 

■ class. Therefore, please bring with you a copy of (a) the Teachers .Manual , 
. (b) the student text ,„ and (c) the student workbook . If your class is. using 
multiple levels, please bring all levels with you. Also, we will need to use 
these materials regularly during the year, so bring copies that we can keep 
(if all teachers in the district use the same materials, we will only need 
one copy of each level,. but we can arrange that : at the workshop). 

2. List of class members. 

3. Results of WHAT? To get you into the swing of the workshop, we have attached 
an abbreviated copy of the Workshop Achievement T_est which will serve as your 
name tag for the workshop. Please complete the' test and bring it with you as 
per the " instructions. 
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It is very important that you attend this workshop, since it will explain and \ 
demonstrate all of the procedures and materials which will be used during the 
project. If something comes up that makes it impossible for you to attend, 
please contact me as soon as possible at (301) 750-2003. 

On behalf of the State Office of Educaticr and your school district 
administration, I would like to thank yo\ or your willingness to participate. I 
know that as a teacher, you already have ~e to do than' can reasonably be 
expected, and your willingness to add another concern to your daily affairs (even 
though this project will take relatively little time) is much , appreciated. We 
believe that the results of this project will do much to assist us in 
understanding -and making more accurate the results of standardized achievement 
tests. 



Sincerely, 



Karl R. White, Ph.D. 

Director, Planning & Evaluation 



KRW:mmt 
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1, Appendix F 

Materials Related to Implementation of Training Materials 
\ 1. Filmstri'p -Evaluation Form 

2. Project Evaluation Form 

\ 




FILMSTRIP EVALUATION 
(Please send this to USU with your next Practice Test) 

School District Teacher 
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Filmstrip # Filmstrip shown on at ^_ 



Date Time 



Please rate the following statements according to this scale: 

Fi Imstrip 

1. The length was appropriate. 

2. The story line was entertaining to the students. 

3. The content addressed skills the students need to learn. 

4. The figures and printing on the filmstrip were clear. 

5. The dialogue was audible. 

6. The filmstrip turner was able to move with the narrated 
page. sq 

Teacher Involvement 









. UJ 








CC 


UJ 






CD 


UJ 






<c 


CC 






to 


CD 








<C 






Q 


>- 




UJ 


>- 


_J 




UJ 


— i 


CD 




CC 


CD 




UJ 


CD 


-zz ..... 


O 


UJ 


<c 


o 


CC 


CC 


uo 




\— 


CD 


>— < 




uo 




o 


uo 


1 


2 


3 


4 


1 


2 


3 


4 


1 


2 


3 


4 


1 


2 


3 


4 


1 


2 


3 


,4 : 


1 


2 


3 


4 



, 7. The teacher was properly ciied to stop the tape. 1 2 3 4 

8. The amount of Owl/teacher interaction was appropriate. 12 3 4 
; 9. The tasks required of the teacher were easy to 

accomplish and defined clearly. 12 3 4 

Student Materials 

10. The student practice was sufficient for students ttT 

apply the concepts they learned through the filmstrip. 12 3 4 

11. The practice exercises were of the appropriate 

difficulty level. . ~ 12.34 

II. Please answer the following questions. - ^ 

1. Have the students applied their test-taking skills to other subjects? 
Yes No 

In what way? ■ . ! 



2. How long did it take you to prepare^to teach this filmstrip? 

3. Were there any concepts presented in the filmstrip that were not learned 
by your students? Yes No j 

Describe 0 : — 



4. Were you the teacher for the filmstrip? Yes No 

5. Did you use the pictures that accompany the filmstrip? Yes No 
How? _ . . • : 



ERLC 



6. If you have any additional comments, please write them on the back of 
O this form. ^ ~ ~ 
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\ 

TEST-TAKING SKILLS PROJECT EVALUATION FORM 



INSTRUCTIONS : Listed below are statements about each component of the project to which 
we would like. you to respond. Please circle the number that indicates how you feel 
about each item. To^save your time, we have not left space for you to write open-ended 
comment's. Instead, aTmember of our staff will soon contact you by phone for you to 
summarize your comments about the best and ..worst -aspects, of each project component and 
h'ow the project could be improved. After the phone call, please return this" form in the 
enclosed envelope. Please be as candid and specific as possibl ersolwe will^know which' 
parts are c good and which parts need to be improved. Thank you. 



FI1MSTRIPS 



Strongly 
Agree 



1. i Instructions for teachers were complete 
and easy to f ol low 

2. The filmstrips were easy to implement 
in the classroom . . t . . 

3. The concept's taught in the filmstrips. 
were important for students to learn ., 

4. The filmstrips taught the concepts 
adequately 

5. The students enjoyed the filmstrips. . 
6.. I plan to use the filmstrips in 

future classes 

7. The filmstrips were Worth the time 

and effort required . 



2 
2 



Neutral 



3 
3 



4 
4 



Strongly 
Pi sagre e 



5 
5 



PRACTICE TESTS 



8. Directions to students were complete 
and easy to follow 

9. Tests were easy to implement in the 
classroom 

10. The test items were appropriate in 
terms of content and difficulty. . . 

11. The\tests adequately prepared the 
students for standardized testing. . 

12. I plan^to use the practice tests in 
the future 

13. Students enjoyed taking the 
practice tests ... 

14. The practice tests were worth the 
time and effort required 



2 
2 
2 
2 
2 
2 
2 



3 
3 
3 
3 
3 
3 
3 



4 
4 
4 
4 
4 
4 



5 
5 
5 
5 
5 
5 
5 



(over) 
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CONTACT AND COMMUNICATION 



Strongly 
A^r 



ee 



2nd 



15. The. U.SU contact person kept me well 

informed 

16.. I was able to reach my USU contact 

person and felt comfortable in doing so. 

17. My needs were responded\to in a 
reasonable amount of time, 

18. The contact person listened 
responded to my feedback \ : 

DATA COLLECTION 

19. Th? observation during testing was 
non-di sruptive 

20. I would not mind having, observers 
again in a similar project ....... 

21. Students enjoyed responding to the 
student attitude measures on Friday. * 

GENERAL IMPRESSIONS 

22. The requirements for participation - 

/ in the study were clearly outlined . . . 

23. The benefits were worth the 
investment of time . . 

24. The project was enjoyable for 
students t ' • . 

25. The project benefited students' 
test-taking ability. .......... 

26. The project enhanced students 1 
performance in other areas , 

27. The project was realistic in scope . . , 

28. I am glad that I participated. . . . . 

29. The fall workshop adequately prepared 
me for the tasks expected 

30. Taking tests was less anxiety-provoking 
for students because of the project. 

REINFORCEMENT 



31. The reinforcement procedures were 
easy for students tq understand. . . . 

32. The reinforcement procedures were . . 
easy for the teacher to 'implement. . . 

33. Students worked hard to earn more than 
their "to beat 11 score on the Last. . . 

34. . Students 1 enjoyed the reinforcement 

procedures 

35. I plan to use the procedures for * - 
reinforcement in the future.'. . . . . 



SPRING WORKSHOP 

36. Workshop materials were clear and 
helpful* 

37. Workshop was appropriate in length . . 

38. Information gained from the workshop(s) 
was worth the amount of time required. 

39. As a result of the workshop, I was 

a better tes.t administrator. ...... 



■2 
2 
2 

2 
2 
2 

2 
2 
2 



2 
2 
2 

2 

2 



2 
2 

2 
2 

2 
2 

2 

2 



N eutral 

3 
3 
3 
3 

3 



3 

3 

3 

3 

3 
3 
3 



3 
3 
3 
3 
3 

3 
3 

3 

3 
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4 . 

4 
4 

4 
4 
4 

4 

4 

4 

4 

4 
4 
4 

4 

4 

4 

'4 
4 
4 
4 

4 
4 

4 

4 



Strongly 
Disagree 

5 
5 
5 
5 



5 
5 
5 

5 

5 

5 

5 

5 
5 
5 

5 

5 

5 
5 
5 
5 
5* 

5 

. 5 

5 
5 
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Appendi x\ G 

Materials Related to instrumentation 

i 

1. Instrument (with Percentage of Teachers Observed Doinq Each 
Alternative Broken Down by Group) Used to Collect Data on 
Quality of Test Administration ' ' 

2. Instrumentation and Explanations Used to x Collect Data on 
Student and Teacher On-Task Behavior During Standardized 
Testing : — 

3. Observer Training Outline and Schedule 

4. Procedures for Observers to Collect the Data for 6 Measures: 
Quality of Test Administration, Teacher and Student On-Task 
Behavior , Student and Teacher Attitude , and Stu dent Test- 
Wiseness 

5.. Schedules for Teachers and Observers for Classroom Visits 
During Testing Week 

6. Instrument (with Means and Standard Deviations for Each Item 
and Subscale Broken Down by Group) Used to Collect Data on 

* . Teacher's Attitude Toward- Standardized Tests * 

7. Directions for Administering Student Attitude and Stuflent 
Test -Wiseness Forms f 

8. 'instrument (with Means and Standard Deviations for Each Item 

Broken Down- by Group) Used to Collect Data on Student 's 
Attitude Toward Standardized Tests 

9. Instrument (with Percentage of Respondents Selecting Each 
Option Broken Down by Group) Used to Collect Data on 
Student's Tesfc-Wi seness Skills 
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OBSERVATION FOR STANDARDIZED ACHIEVEMENT TESTING 



Teacher 



School 



Date 



T1 



. rr. a 
HIIC 



SD TD Observer 



Partner 



STUDENT II 
12 3 4 



IKNTir 
S 6 7 8 



STUDENT 13 
9 10 11 12 



SWoENf #4 

13 14 15 16 



TOM ONTASK 
t ONTASK >. " 



TOTAL ONTASK 
t ONTASK 



TOTAL ONTASK 
% ONTASK 



TOTAL ONTASK 
t ONTASK * 



Unusual Circumstances: 



STUDENT 15 
17 IB 19 20 



TOTAL ONTASK 
% ONTASK " 



CODE: Q] Ontask (for entire interval).. 
f7] Probably ontask 
[T] Off task (for part of interval) 



TEACHER 
21 22 23 24 



TOTAL ONTASK 
% ONTASK 



Beginning of 
\2 test time ■ • 



End of timed .,. 
□ test - 

No record made 
[] (Explain in 
\ NOTES section) 



i . \ 



Directions : Record 4 intervals on one student before observing nejt 
student. Observe 5 students and one; teacher for \ total of 24m 
intervals before repeating sequence. 1 . / 



Observer Training 
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2. 



3. 



4. 



5. 



Introduction to tests 

a. group administered standardized achievement tests/show example 

b. machine scorable/multiple choice format/no separate answer sheet 

c. tests cover reading and math 

d. only observe reading subtests 

e. both TD and SD 

- TD example - word study 

- SD example - timed test/vocabulary, comprehension 

f. observe both teacher and students 



Types of observations 

a. two types: checklist and interval recording 

b. training is important to clearly define the parameters 
- to increase reliability/consistency 

c. not feasible to record all behaviors - no way to summarize 

d. reduce to categories - numbers - data analysis 

e. work in pairs 



Quality of Test Administration Checklist • 

a. go over heading 

b. Class Environment ■ 
TAPE - Stop at "hurry boys" 

c. Student Preparation - Remind Students 
TAPE - stop at "I'm going to give . . . ." 

d. Positive Atmosphere and Reading Directions 
TAPE - Stop at "Stop Jape" 

e. En d of test - after test 
Fi 1 1 in checkl ist 



T eacher On-Task 

a. go over definition 

b. watch for teacher on-task 

TAPE - Stop when teacher moves over 



Student On-Task 

a. go over definition 

b. watch for student on-task 
TAPE - Stop at "stop tape" 
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6. Observation form 

a. go over items on form 

b. play tape - listen to intervals 

c. explain use of entire interval 

d. practice on tape - start at 515 (timed) - no teacher 

e. check standard 

f. practice on tape - start at 421 (timed) with teacher 



7. Complete Rehearsal 

a. organize materials 

b . go .over, observation procedures 

c. leave room and return to set up 

d. start checklist at 277 

e. start interval at 340 (teacher directed) 

direction giving) 

f. continue teacher directed at 376 

g. complete checklist at 402 (Stop interval )(Channel 1) 



8. Schedule 

a. district schedule 

b. consultant forms 

c. go over first page 

d. write name of contact person and schedule on front 

e. Monday's schedule in detail - 
(TD and SD will not be concise) 



Is 




345 



• l 5 296 
TTame of Observer Headquarters Phone J District 















SCHEDULE 




Date 






Time 




Location 


Activity \ 


Granite/Nebo 


Cache 

- 












3/26 


3/26 


9 


00 


- 3: 


00 


Sirloin Stockade, SLC 


Training 


3/29 


4/5 


8 


•30 


- 10 


00 


District School s 


Data col lecting 


3/29 


4/5 


1 


00 


- 3 


00 


Headquarters 


Retraining 


3/30 


4/6 


8 


30 


- 12 


00 


District Schools 


Data col lecting 


3/31 


4/7 


8 


30 


- 12 


00 


District Schools 


Data col lecting • 


4/01 


4/8 


8 


30 


- 12 


00 


District Schools 


Data collecting 


4/01 


4/8 


1 


00 


- 3 


00 


Headquarters 


Training 


' 4/02 


4/9 


8 


:30 


- 3 


:00 


District School s 


Data col lecting 


4/02 


4/9 


3 


:00 


- 4 


:00 


Headquarters 


Final meeting 



SUBTESTS FOR OBSERVATION 

/ 

Teacher , Timed 



District 


Test 


Subtest 


Time 


Directed; 


Test 


Granite 


SAT 


Word Study Skills: Part A 
Reading: Part B 


10 
25 


X / 

/ 


X 


Cache 


CTBS 


Word Attack 

Reading Comprehension 


38 
28 


x / 

/ 

/ 


X 


Nebo 


ITBS 


Word Analysis 
Stories 


20 
15 


X 


X 



NOTES ON OBSERVATIONS 

1. Each test (both teacher directed and timed tests) will .be observed in each 
classroom. 

2. Observers will be randomly paired each day. 

3. During each observation, paired observers will collect data first on the 
teacher directed, then on the timed te^ts. Tests will be administered 
consecutively with a 5-10 minute break between. 

4. On Monday, observers will practice for 1 hour in a classroom befr^ 
retraining that afternoon at headquarters. 

5. Data will be rr ,1 -^t ) i n the schools on Tuesday, Wednesday, and Thurso y 
mornings. 
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Observers are to return to headquarters each morning after observations have 
been completed. Forms will be checked and observers will be given new forms' 
and equipment for the next day. 

On Thursday afternoon, observers will be trained to administer the 
test-wiseness and student attitude scale. This scale will be administered on 
Friday. 

A final meeting is scheduled on Friday afternoon. 
Checks will be mailed to you on May 10th. 
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OBSERVATION PROCEDURES 



1. Locate the schools before the day you observe. Actually drive to any 
schools which you may have difficulty finding in a hurry. 

2. Fill out forms with the information .that you have. Remember to bring 
the tape recorder, earphone, pencils, tape recording, forms, and clip 
board to the school . 

3. After driving to your assigned school, leave extraneous items (e.g., 
coats, purses, notebooks, etc.) in the car if possible. 

4. Report to office to ask for directions to the teacher's room. 

5. Report to the teacher's room and check to see if the subtests are 
scheduled correctly (first the teacher directed, then the timed test). 

6. Arrange your seating so that students can be clearly seen. 

7. Set up tape recorder and earphones. ' 

8. Select students to observe from those closest to you. Try to select a 
representative group by counting off every third student. 

9. Identify students on observation form by hair, shirt, dress, etc., and 
coordinate your observation pattern with your partner. 

10. Start to fill in the checklist and keep it handy for notes throughout 
subtest. 

11. Begin taking interval data for the teacher directed test when teacher 
starts reading the directions from the manual. If the teacher gives 
students a five-minute break between subtests, do not record data. 
Begin taking interval data for the timed tests when', the teacher starts 
reading the directions. 

12. Remember to break eye contact with students who look, at you. 

13. Don't show- data collection forms to teacher—they are naive to 
experimental conditions . 

i 

14. When both subtests are finished, obtain the names of the observed 
students from teacher and complete the checklist. • 

15. Exit from the room as quickly and quietly as possible.! 

16. Go to next classroom or headquarters .to report. ■, \ 

• J \ 
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Notes for trainer (attitude and TW) 

1. Stand in front of class (make sure you can observe all students) 

2. All students must be seated and facing front of class before 
administering forms. 

3. Students need to have sharpened pencil and eraser on top of 
desk. 

4. Students must follow directions (very important) 

a. Al low time for 

b. Make sure students put names on booklets* 

5. Pacing of quesitons critical 

a. Allow time for questions 

b. Allow reasonable time for completion of item(s) 

6. Discuss class response cue 



Order of training 

1. Model the administration of each form. Observers should acually 
work as though they are the students. 

2. Discuss the notes above and procedures for administration. 

3. Supervise the observers as they practice administrating both 
forms. (All observers should administer all items.) 

4. Distribute envelopes, rubber bands, and extra forms. 

5. Schedule debriefing meeting on Friday afternoon. 
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GENERAL PROCEDURES FOR ADMINISTERING 
ATTITUDE AND TEST-WISENESS INSTRUMENTS 



You will administer three forms: 

a. Student attitude 

b. Student test-wi seness 

c. Teacher attitude _ y ^ 

When referring to forms, call them "booklets". 

At appropriate breaks in the administration, praise students for working 
hard, trying their best, listening to instructions, and paying attention. 

Make sure the teacher's name is on the teacher attitude form. 

Use group response to obtain answers to questions. 

Stand in front of the class when giving directions or reading items,. 
Make sure you can see all the students 1 faces. 

Clarify and repeat directions (if necessary) and items for student 
attitude form. 

Clarify and repeat directions (if necessary) for test-wiseness forms. 

Do not repeat or explain any items on test-wiseness forms. Tell students 

to try their best if they want help. 

Proceed in this order: 

a. Give the teacher attitude form to the teacher before starting with 
the students. 

b. Introduce you/self to the class with your name and purpose. (For 
-instance, "We>war\t tc find out how second grade students feel about 
tests.") 

c. Have students put a pencil and an eraser on top of their desks. 

d. Pass but and administer student attitude booklet./ 

e. Collect student attitude booklet. 

f. Pass out and administer student test-wiseness booklet. 



g. Collect student test-wiseness bobklet. 
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Procedures for Friday 



1. You will administer three forms: f 

a. Student attitude 

b. Student testwiseness 

c. Teacher attitude. 

2. Give the teacher? attitude form to the teacher before starting 

with the students. 

\ 

\ 

3. Introduce yourself with yotr name and purpose. (For instance, 

"We want to find out how second grade students feel about tests.") 

4. Have students' put 'a pencil and an eraser on top of their desks. 

5. Pass out and administer student attitude form. 

6. Collect student attitude form. 



7. Pass out and administer student testwiseness form. 

8. Collect student testviis.eness form. 



9. When referring to forms, call them "booklets." 

10. At appropriate breaks in the administration, praise students for 
working hard, trying their best, listening to instructions, and 
paying attention./ 
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SCHEDULE FOR OBSERVERS 
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Means and SD on Each Item 



Instructions: "Please read each question carefully and circle the response which 
. best Indicates your feelings about standardized achievement tests. 
There 1s no need to write your name on this questionnaire, because 
only group responses will be analyzed. There are np riqht or wrong 
answers, so please be as honest and candid as possible. Thank you 
for your help. 

Percentage of Respondents Selecting Each Option 



C T 

EI 


EI I 


C_ 


Total 


I. What 


1s your general opinion of 

v — j 


3. 00 


3.00 


3. JO 


3.05 


1, 


5.2 

Harmful for students 1 


.95 


.75 


.91 


.87" 


3.24 
1.09 


3.'41 
1.06 


3. 10 
1.16 


3.24 
1.10 


2. 


Not useful 6 * 9 
for teachers 


2.71 
1.23 


2.71 
1.05 


2.90 


2.78 
1.03 


3. 


12.1 
Unfair 1 


• O CO 

1.12 


.79 


0 7 C 

1 . / D 

.85 


O C A 
C. 04 

.93 


4. 


13.8 
Invalid 1 


3.29 
.96 


<3.29 
d .92 


3.05 
1.05 
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5. ■ 


1.7 

Too difficult for i 
students 
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(1-5) 










II. How do you feel about administer 
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8. 


Not knowledgeable 1 
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9. 
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Insecure 1 
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Total 


(6-10) 



5,2 13.8 56.9 19.0 5.2 

2 3 4 5 Helpful for students 



for teachers 



3 4 5 Fair 

8^12.1 , 1.7 
3 4 5 Valid 



_ 5 Appropriately difficult 
for students 



Calm - 

Interested 
Knowledgeable 
Supportive 
Confident 



0 
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II. Standardized achievement test results are used in many ways/by different teachers 

*rid school systems. Please Indicate how useful you think such test results could be 



Means 


and SD 


on Each 


I tern 


and SubscaTe 


for Each 


Group 


EI_ 


EII 
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55 
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for each of the following purposes: 



11. To report to parents to help them interpret their 
child's performance in school. 

12. To measure in? educational status of individual 
students as compared with others their age. 

13. To measure the educational "growth" nf students 
from year to year. / 

14. To screen students for and make decisions about 
placement in special/ education programs. 

15; To help plan instruction for Individual 
. students. / „ 

16. To help plan instruction for class groups. / 

17'. To evaluate specific teaching methods, / 
Instructional materials, and/or educational / 
profv'ams / / 

18. To report to newspapers informing the public / 
ahnut difference^ between schools. 

19. To repnrt to adn Inistrators as an aid in decision 
making. 



Percentage of Respondents 



/ 



Not 

Harmful/ Useful 



Somewhat Very 
Useful Useful 



20. 



lu evaluate and 



make comparisons between the 



8.6 / 
1 / 
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3.4 
1 


41.4 

2 


51.7 . 
3 
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58.6 
1 


31.0 
2 


10.3 
3 


0.0 



performance of different teachers. 

. f 

Total (11-2Q) 

IV. Would you personally be In favor of: 

21. Increased use of Vmlnlmum competency tests 
i.o determlne'high school graduation? / 

22. Increased use of achievement test resultsjto 
compare how successful various schools are? 

23. increased use of achievement test results/ 
for feedback to students about their 

I performance? 



7 



/ Strongly Somewhat 

j in favor In favor Somewhat Strongly 

against against 



Increased use of achievement test results 
bv classroom teachers to make instructional 
^nd curriculum decisions^ 

Total (21-24) \ 



of 
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1.7. 
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20.7 
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Means 


and SO 


on Each 


I tern 


nd Subscale 


for Each 


Group 
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.75 
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.83 
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Percentage of Respondents 
V. How do you think standardized achievement tests make your students feel? 



25. 


Anxious 


15.5 
1 


46.6 
2 


25.9 
3 


12.1 
4 


0.0 

5 


Calm 


26. 


Smart 


1.7 
1 


19.0 

2 


41.4 
3 


32.8 
4 


5.2 

5 


Dumb 


27. 


Bad 


5.2 
1 


17.2 

2 


56.9 
3 


17.2 
4 


3.4 
5 


Good 


28. 


Afraid 


8.6 
1 


34.5 
2 


37.9 
3 


• 10.3 
4 


8.6 

5 


Not afraid 


29. 


Successful 


5.2 
1 


17.2 
2 


50.0 
3 


20.7 
4 


6.9 

5 


Unsuccessful 


30. 


Insecure 


5.2 

i 


22.4 
2 


53.4 

3 . 


15.5 
4 


3.4 
5 


Confident 



Total (25-30) 



VI. This 1s the end. of the questionnaire concerning teachers* attitudes towards 

standardized tests. To assist us 1n analyzing the other data from the project, 
we would like you to answer a few. more questions concerning standardized test 
administration procedures in your classroom. Again, there are no right or wrong 
answers . 

Part VI responses are. summarized on Quality of Test Administration Checkli 

1. When did you first tell your students they would be taking a standardized 
achievement test? 



a. the day the test began- 

b. the day before the test began 

c. 2-5 days before the test began 

d. other, please specify 



2. Did you do anything 1n particular to prepare the students for taking the 
standardized test? 



a. No 



b. Yes (please explain) 



During the standardized test, did you give students any specific 
instructions about what they should do if they finished a timed subtest 
before the allotted time was up? 

a. No b. Yes (please explain) 



7 Prr* 
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The diagram below is how a typical classroom might be arranged. For the 5 
children in your class who have the most difficulty with misbehaving, acting 
out, or lack of attention, indicate with "X's" (one for each child) the 
approximate location of where they sat during the standardized test. (Note: 
even though your room may not have individual desks or the desks may be 
arranged differently, you can still approximate the location of these 
students using this diagram. 



Front of room 



Back of room 
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STUDENT ATTITUDE TOWARDS STANDARDIZED TESTING 

DIRECTIONS 





DO 


SAY 






1. 


Print this on the blackboard. 

fun 0 0 0 boring 

good 0 0 0 bad 








2. 


Demonstrate where to put 
pencil . 


I will pass a booklet to e^ch of 
your pencils on the booklet. Do 
the booklet over until I tell you 


you. 
not 
to. 


Put 
turn 


3. 


Pass out copies of booklets 
faced down to each student. 









Today I will ask vo;j some questions about 
hjow you feel toward the test you have been 
taking this week. What we. will do* today is 
nbt really a test because there are no 
right or wrong answers. I will read part 
of a sentence, then you will mark the 
answer to finish'the sentence. 



First, I will show you how to mark your 
answers. Listen to this sentence. 

I think playing baseball is. Say that 
with me. I think playing baseball is. 



6. Point to the first line on Look at the board. Here are some answers 

the board. and circles. At one end is a circle near 

the word "fun." At the other end is a 
circle near the word "boring." The circle 
in the middle means that playing baseball 
is not fun and not boring but sort of in 
■ between. 
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Point to each circle:. 
Fill in circles with 
side-to-side motion. 



Who thinks that playing baseball is fun? 
Then you would have marked this circle. 
How many think playing baseball is in 
between fun and boring? Then you would 
have marked this circle. How many think 
playing baseball is boring? Then you . 
would have marked this circle. Remember, 
fill in only one circle for each sentence. 



Some people think baseball is fun and 
others think it is boring or in between. 
There is no right or wrong answer for this 
question. Is there a right or wrong 
answer to this question? 
(Students : No. ). 



9. Demonstrate. 

Pause and check fingers. 



Turn your paper over. (Pause) Look at the first 
page. -Put your-finger on page number 1 at - 
the bottom. 



10. Check names. 



Put your finger on the word "name" at the 
bottom. Write your first and last name 
on the 1 ine. 



11. Pause and check fingers. 



Now, put your finger on sample number 1 at 

'e top of the page. A sample shows you 

I. ' to do other items. What does a sample 
show you? 

(Students: How to do the other items.) 



12. Check fingers. 



Pause and wait for students 
to mark paper. 



For sample number 1, I will read a sentence 
and you will mark the circle that tells best 
how you feel. The sentence is, I think 
broccoli tastes. Say that with me. I 
think broccoli tastes. Now, mark the 
circle that tells you how you think broccoli 
tastes. 



13. Demonstrate on board. 
, Fill in circles. 



If you think broccoli tastes good, you should 
? 3ve marked this circle. . If you think 
i;. jccoli tastes bad, you should have marked 
this circle. And, if you think broccoli 
tastes kind of in between good and bad, you 
should have marked this circle. 



9 

ERIC 



360 



DO 



SAY 



311 



14. 



Different people think broccoli tastes 
different,' so there is no right or wrong 
answer. Is there a right answer for this 
sentence? 

(Students: No.) 
That's right, there is no right or wrong 
answer for this sentence. 



15. Check fingers. 



Pause and wait for students 
to mark paper. 



Put your finger on sample number 2. Listen. 
The sentence is, Math problems are. Say 
it with me. Math problems are. Math 
problems are good, in between, bad. Now, 
mark the circle that tells best how you 
feel about math problems. 



16. Check fingers. 



6 



Now, turn the page to ,page number 2 and 
put your finger on page number 2 at the 
bottom so I can see .you are on the right 
page. Good. 

For these items, you will mark the 
circle that tells how you feel 
about the test you have been taking this 
week. Remember, there are no right or 
wrong answers. Are there any right or 
wrong answers to these questions? 
(Students : No. ) 



17. 



For each item, you should mark a circle 
that is near the word that tells how you 
feel. Who knows which ci rcl e you should mark? 
(Student: Near the word that tells 
how I feel . ) 
Raise your hand and ask me if you have any 
questions. 



18. Repeat* any items that seem 
confusing, and explain words 
that students do not understand. 



Good. Point ton'tem number 1. Taking 
tests makes me feel not afraid, in between, 
afraid. Mark the circle that shows best 
how taking tests makes you feel. 



19. 



Finger on item number 2. Taking tests 
makes me feel happy, in between, sad. - 
Mark the circle that shows best how you 
feel. 



20. 



Item number 3. Taking tests makes me 
feel smart, in between, dumb. 
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21. 


Iter?) number 4. Taking tests makes me 
feel good, in between, bad. 


22. 


Item number 5. Taking tests makes me 
feel calm, in between, nervous. 


23. 


Item number 6. I think tests are fun, 
in between, boring. 


24. 


Item number 7. I think tests are fair, 
in ( between, not fair. 


25. 


Item number 8. Do tests help your teacher 
teach you better? Yes , in between, no. 


26. 


Check your paper to see that you have 
marked a circle for every item. Now, pass 
your papers to the front of the room and 
I will collect them. 



! 



0 



e 
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Sample #1 

good O K> O b * d 



Sample # 2 



easy 



O— — O O h ^d 



Name 



0 
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2 



314 



, 1 not afraid 



-Q afraid 



happy Q- 



O 



smart Q- 



O 



-O dumb 



good Q- 



-Q bad 



:alm Q- 



o 



o 



nervous 



fun Q 



O borin 9 



fair 



a 



8 



yes Q- 



o 



o 



no 



0 
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MEAN SCORE 
STANDARD DEVIATION 



EI EII C Total 



ERIC 



r- .0 



1.3 


1.4 • 


1.4 


1.4 


,6 


,fi 


■ / 


r 




£=,.04 




1.4 


1.4 i 


1.5' 


1.4 


.6 




.6 " 


.6 






08 




1.3 


1.3 


1.3 


1.3 


r 
• U 


.6 


.6 


4 


1.7 


1.4 


' 1.5 


0 

1 A 
1,4 


.7 


.6 


.7 ' 


. .6 




£= .05 




1.9 


2.0 


2.0 


2.0 


.9 


.9 


.9 


.9 


1.9 


1.7 


2.0 


1.9 


.9 


.8 


."9 


.9 




p.O 




1.4 


1.3 


1.4 


1.4 


.7 


.6 


.6 


.7 


1.3 


1.2 


1.4 


1.3 


.6 


.5 


.7 


.6 



Percent of Students 
Selecting Each Option 



66. 6 28.7 4.6 ~T' 

1 not afraid 0 0 Q afraid 



6". 2 ' 32.7 

2 happy 0- 0 



5 



7 



8 



39.7 

calm Q- 



6 fun 0" 



81.5 

yes^Q- 



9.2 1 

-0 



6.0 \ 

% r 



sad 



\ 



. 77.4 15.0 .,_ ...7,6 \ 

3 ' smart ^ Q -Q Q-dumb ) 

■ \ 

67.0 , 24.6 . 8.4 ■■• ^ 

4 . good (>_{)— 0 bad 



20.8 39.4 

-0— Q nervous 



34.1 

-0 boring 



17.3 

o- 



73.7 ■ . 16.4 : 9.7 ' \ 

to 0 0— 0 not fair • : 



Do tests help your teacher teach 
, you better? 



9.0 

-Ono 



CO 

Ah, 
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El 
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C 




Total 


















6. 


Which of the following animals does not lay eggs? ^17 


6.7 


6.7 


4. 


1 


5.8 




0 


chickens \ 


31.2 


27,5 


31. 


5 


30.3 




0 


snakes 


4.7 


2.8 


• -4. 


5 


4.1 




0 


birds 


r- r ~s. 

56.9 


61.9 


59. 


2 


59.2 




1 


orangutans 


0\4 


1.0 


0. 


6 


0.7 






Bl ank 


■ 










7„ 


Abraham Lincoln, who lived during' the Civil War, was 


67.1 


65.5 


68. 


0 


67.0 




0 


a president < of the United States 


11.0 


8.8 


9. 


7 


9.9 




0 


a multimi 1 1 ionaire 


7.3 


6.5 


6. 


4 


6.8 




0 


a pi lot 


-3.9 


16.8 


1 A 

14. 


6 


15 . 0 




0 


a soldier _ . / • 


- 0.6 


2.3 


1. 


3 


1.3 






Blank 

/ 
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8. 


Which of the 


following animals does lay eggs? 


26.3 


20.5 


24. 


2 


23.9 




0 


gnus 


50.2 


50.3 


49 


8 


50.1 




0 


snakes 


11.2 


13.5 


12 


0 


12.1 




0 


sheep 


11.8 


14.0 


13 


3 


13.0 




0 


mice. 


0.4 


I- 8 


0 


6 


0.9 






Blank 












9. 


What is a hyperbola? 


26.9 


28.5 


33. 


3 


29.6 




0 


a planet 


29.8 


26.9 


27. 


3 


28.1 




9 


the locus of a point whose difference in 
















distances from two fixed points is constant 


17.6 


16.8 


13. 


9 


16.1 




0 


burned wood 


23. 3 


23.8 


23 


4 


23. 5 




0 


a satel 1 ite 


2,4 


3.9 


2 


i 


2.8 






Blank 

a 












10. 


Dwight Eisenhower 


45. 7 


43.0 


45 


3 


44.8 




0 


;was a general during World War II 


14.ll 


16.1 


20 


2 


16.8 




0 


astronaut 


" 19.8 


17.4 


15 


5 


17.6 




0 


of Russia 


17.8 


19.7 


16 


5 


17.9 




0 


in the United States 


2.7 


3.9 


2 


6 


3.0 






Blank 



7.8 
22.9 
15.1 
53.1 

1.2 



11. 



8.3 
22.0 
12.7 
54.9 

2.1 



6?\ 7.5 
20.4>21.8 
15.2 14.5 



56.0 
1.7 



54.6 
1.6 



\ . - 

Sally and Jane\re in the same class. Sally can hang from 
the bar for 10 seconds. Jane can hang from.- the bar a lot 
longer than anyone in her class. She hangs for 

0 2 seconds 

0 10 seconds 

, 0. .30 seconds . 

0 60 seconds , " 



22.4 
28.4 
18.2' 
28.2 
2.9 



9 



22.5 
26.2 
18.4 
28.8 
4.1 



19.7 
30.3 
19.5 
27.0 
3.4 



12. "Four score and seven years ago 



27.0 
25.9 
18.4 
26.0 
2.8 



." is the beginning of the 



0 Papa Encyclical Rerum Novarem 
0 President Reagan's State of the Union message 
0 The Gettysburg Address 
0 John Kennedy' s .funeral eulogy 
Blank 
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PART A 



El 



E2 



Total 



Sample: On a nice day, the sky is 

0 white 

0 bl ue 

0 pink 

0 black 



26.1. 23.3 26.6 25.5 



15.1 
43.7 
13.9 
1.2 



15.8 
42.0 
17.6 
1.3 



16 
45 
9 
1 



15.8 
43.7 
13.6 
1.4 



1. An anesthesiologist is a 

0 .basketball player 
0 barber 
9 physician 
0 , hairdresser 
Bl ank ■ 



2. What sport is played on a field? 



34.9 
47.6 

4.1 
13.3 

0.2 



46.4 
40.9 
4.4 
8.0 
0.3 



36.3 
50.2 

3.0 
10.3 

0.2 



38.7 
46.6 

3.8 
10.7 

0.2 



0 soccer 
0 football, 
0 polo 

9 all of the above 
Blank 



3. Who discovered how to pasteurize milk? 



1976, 
16.9 
31.2 
31.0 
1.0 



20.7 
13.1 
26.9 
32.6 
1.6 



20.4 
17.2 
33.3 
28.1 
1.1 



20.3 
17.4 
30.7 
30.5 
1.2 



0 Madame Pompadour 
0 Madame Curie 
0 Louis Pasteur t 
0 Mi lo Bishop , , ' 
Blank 



33.3 29.0 35.4 32.8 



22.2 
21.4 
21.6 
1.4 



19.4 
20.2 
28.8 
2.6 



23.0 
18.7 
21.5 
1.5 



88.7 
20.1 
23.6 
1.8 



4. Herringbone is 

0 . a fossil 

8 pickled mackerel that has been frozen 
0 a fabric pattern 
0 a small 'animal that lives on Mars 
Blank 

'5. About how many glasses of water should..a "person drink each day 



26.1 
6.9 
52.2 
14.1 
0.6 



23.8 
7.5 

59.1 
8.5 
1.0 



24.0 
7.9 

58.6 
9.0 
0.4 



24.7 
7.5 
56.4 
10. -7 
0.7 



0 


l : 


0 


30 


• 


8 


0 


50 




Blank 
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El 



E2 



Total 



13. 



58 


.4 


62 


.2 


60 


.3 


60.1 


14 


.3 


.14 


.8 


14. 


2 


14.4 


11 


.0"** 


8 


.5 


7 


.9 


9.2^ 


13 


.7 


13 


.0 


13 


9 


13.6 : 


2 


.7 


1 


■6 


3 


.6 


2.7 


20 


.4 . 


20 


.2- 


16 


.7 


19.1 


15 


.1 


12 


.2 


11 


.2 


12.9 




rf— 








.4 


13.5 


46 


.1 


52 


.1 


56 


.0 


51.3 


3 


.5 


2 


.6 


3 


.6" 


3.3 


18 


.6 


13 


.7 


14 


- 

.4 


15.7 


31 


.2 


39 


.6 


40 


.1 


36.7 


22 


.0 


17 


.9 


21 


.0 


20.5 


24 


.1 


26 


.4 


20 


.2 


23.4 


4 


.1 


2 


.3 


4 


.3 


3.7 


13 


.7 


10 


.6 


12 


.7 


12.4 


16 


.1 


17 


.4 


15 


.9 


16.4 


18 


.0 


20 


.7 


17 


.0 


18.4 


46 


.9 


47 


.4 


49 


.6 


48.0 


,5 


•3 


3 


.9 


4 


.9 


4.8 


18 


.8 


14 


.0 


15 


.7 


16.3 


43 


.7 


50 


.5 


45 


.7 


46.3 


8 


.6 


9 


.8 


6 


.7 


8.3 


24 


.3 


22 


.5 


26 


.8 


24.7 


4 


.7 


3 


.1 


5 


.2 


4.4 


23 


.3 


19 


.9 


■ 

25 


.5 


23.1 


35 


.9" 


36 


.5 


36 


.9 


36.4 


20 


.2 


22 


.5 


18 


.7 


20.3 


14 


.1 


15 


.0 


12 


.4 


13.8 


6 


.5 


6 


.0 


6 

/ 


.4 


6.3 

1 


24 


.3 


19 


.9 


22 


.3 


22.4 


31 


.2 


31 


.9 


36 


.3 


33.2 


17 


.8 


18 


.1 


17 


.6 


17.8 


20 


.0 


23 


.8 


16 


n 
. ~> 


19.8 


6 


.7 


6 


.2 


7 


.5 


6.9 
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The number of men who have be;e 
is less than 



/ 



II 

0 
0 
0 



45 
30 
18 
7 

Blank 
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president of the United States 

■ L 



14. A stretch of land between two mountains is called x a: 



0 
0 
0 



hill 
river 
mound 
val ley 
Bl ank 



r 



15.. The average person lives about 



0 



to years 
9 73 years 
0 150 years 
0 200 years 



16. Abalone is an ocean crustacean. It lives in the 

0 trees 
0 ground 

• !ake 
0 sea 

Blank fK 

17. The number' of miles from the earth to the moon is less than 



0 
i 

0 
0 



18. Abalone is 

0 
9 

0 
0 



243,000 , 

250,000 

244,000 

249,000 

Blank 



4 disease of one foot 

a crustacean 

a cold cut 

a style of hair 

Blank 



19. The Susan B. Anthony dollar honors 



0 one of our founding fathers 
9 the woman who led the suffragette movement 
0 a famous baseball player 
0 the husband of Betsy Ross 
Blank 



GO ON 



> 



369 



El 


E2 


C 




Total 












■ 20 ; 


The equator 


48.6 


43.0 


42. 


5 


44.9 


8 


17.1 


14.5 


15. 


5 


15.8 


0 


15.3 


22.8 


19. 


1 


18.8 


0 


11.0 


12.4 


12. 


2 


11.8 


0 


8.0 


7.3 


10. 


7 


8.7 





319 



36.7 
18.8 
22.9 
12.4 
9.2 



11.8 
26.3 
10.8 
41.2 
9.8 



44.0 
16.3 
19.4 
13.0 
7.3 



9.6 
22.5 
/ 9.3 
'52.3 

6.2 



43.3 
20.0 
14.2 
12.0 
10.5 



6.0 
26.8 
10.3 
44.6 
12.2 



mossy rocks | \ 
pieces of la'ya rock 
thick black ;smoke \ 
Blank i \ 

| . \ 

21. The Star Spangled Banner,: written by Francis Scott Key, is 

anthem \ 



41.1 


0 


the national! 


18.5 


0 


a poem 


18.9 


0 


a book ; 


12.4 


0 


a magazi'ne | 


9.1 




Blank 


22. 


Which is not 


a flower? \ 


9.2 


0 


rose 1 


25.4 


0 


paisley 


10.2 


0 


tulip / 


45.6 


6 


goldenrod 


9.6 




Blank 



23. What does erosion do? 



17.1 


18 


.9 


18 


.2 


18.0 


26.9 


23 


.8 


28 


.1 


26.5 


14.3 


,' 17 


.9 


12 


.4 


14.7 


29.4 


f 29 


.5 


27 


.7 


28.8 


12.2 

1 


9 


.8 


13 


.5 


12.0 


t 

23. 5. 


\ 

24 


9 


25 


5 


24. e 


13.9 


13. 


5 


13 


1 


13.5 


18.2 


27. 


2 


21 


0 


21.8 


32.2 


23. 


3 


25. 


8 


27.4 


12.2 


111. 


1 


14 


6 


12.7 



\ 
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0 carries light 1 
0 makes oxygen 
0 gives flowers water 

carr ies .away the material thajt is broken up 

by weathering 1 

Blank 



24. Which of the following are vegetables? 

0 corn 1 
0 rutabaga ; 

0 carrots 

1 all of the above 
Blank 

25. The capital of Oklahoma is 



12.4 


11 


.1 


10 


.7 


11 


.5 


j 0 Tulsa 


45.9 


50 


.8 


51 


.3 


49 


.2 


• Oklahoma City 


13.1 


13 


.7 


9 


.2 


11 


.9 


1 0 Phoenix 


14. 5 


12 


.2 


11 


6 


12 


.8 


■ ; 0 Salem 


14.1 


12 


.2 


17 


.2 


14 


.6 


. Blank; 














26. 


Tell which movie was not about the 


18.2 


19 


.2 


13. 


9 


17 


0 


0 2001 Space Odyssey 


27. ;6 


.24 


.6 


23. 


4 


25 


3 


• 1776 


15;i9 


17 


6 


17 


2 


16 


8 


0 Star Trek 


2 5. '5 


25 


.1 


28. 


5 


26 


5 


0 Star Wars 


12i9 


13 


5 


17 


0 


14. 


5 


Blank 
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Sample: 




"75 Fop 

0 leg 

0 table 

0 wood 



r 1 

El 


E2 


L 




1 ota 1 




12.4 


10.1 


15 


.7 


12.9 


1 


0.2 


0.3 


0 


.2 


0.2 




0.0 


0.0 


0 


.0 


0.0 




8o .3 


88.3 


83 


. 7 


86 .0 




1.0 


1.3 


0 


.4 


0.9 




15.5 


15.5 


17 


.0 


lfi.O 


2 


?7.l 


30 . 3 


27 


.9 


'J8.3 




53.3 


51.3 


50 


.9 


51.9 




2.4 


1.8 


2 


. 1 


2.2 




1 .5 


1.0 


2 


. 1 


1 .6 




2.2 


1.3 


2 


.8 


2.2 


3 


93.7 


95.3 


93 


.6 


94.1. 




1 A 

1.4 


0.8 


0 


.9 


1.0 




-2.4 


1.8 


2 


.6 


2.3 




0.2 


0.8 


0 


.2 


0.4 




6.5 


6.0 


9 


.2 


7.3 


4 


0.2 


0.3 


1 


.1 


0.5 




0.4 


0.5 


0 


.4 


0.4 




97.2 


92.5 


89 


.3 


91 .4 




0.2 


0.8 






0.3 




3.9 


4.4 


4 


9 


4.4 


5 


i.6 


1.3 


1 


.5 


1.5 




93.7 


93.0 


92 


9 


93.2 




0.8 


1.3 


0. 


6 


0.9 




4.7 


3.6 


6 


.2 


4.9 


6 


1.0 


■ 1.3 


2 


8 


1.7 




93.7 


93.8 


90 


.6 


92,6 




0.0 


0.0 


0 


0 


0.0 




0.6 


1.3 


0 


4 


0.7 




72.7 


81.1 


77 


7 


76.8 


7 


6.3 


5.7 


6 


4 


'6.2 




5.5 


4.1 


4 


7 


4.8 




12.2 


6.2 


9 


0 


9.4 




3.3 


2.8 


2 


1 


2.8 




2.2 


0.3 


0 


9 


1.2 


8 


93.5 


96.4 


95 


5 


95.0 




3.1 


1.8 


2 


1 


2.4 




0.4 


0.5 


1 


3 


0.7 




0.8 


1.0 


0 


2 


0.7 






S3 f~!) EB 








U 


p 1 ant 


0 


boy 


U 


house 


P 


f 1 ower 




Bl ank 


0 


paper 


0 


_ * i ..11 
si dewa Ik 


6 


abode 


u 


oa M 




Blank 


0 


cut 




lawn mower 


0 


bl ade 


0 


grass 




Bl ank 


0 


animal 


0 


paw 


0 


hair 


9 


dog 




Bl ank 


0 


watch 


0 


look 


0 


television 




Blank 


0 


food 


0 


meat 


• 


hamburger 




Bl ank 


e 


canine 


0 


1 amp 


0 


chair 


0 


house 




Blank 


0 


clean 


« 


bathtub 


0 


water 


0 


faucet 




Blank 
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9 



El E2 C Total 



The bo y, whose name was John, went to th e store. 



2.4 3.4 2.4 2.7 1. store 

0 

4.3 2.6 2.1 3.1 was 

0 

91.8 92.5 92.7 92.3 went 

• 

1.0 1.0 1.9 1.3 the 1 

0 

0.4 0.5 0.9 0.6 Blank 



87.6 86.8 - 86.5 87.0 2. the 1 

® 

1.6 1.8 1.5 1.6 went 

0 

1.4 2.8 1.7 1.9 boy 

0 

9.2 8.0 9.9 9.1 store 

0 

0.2 0.5 0.4 -0.4 Blank 



3.9 0.8 2.1 2.4 3. boy 

0 

94.3- 94.8 93.1 94.0 John 

1.0 1.8 2.6 1.8 went 

0 ■ 

0.6 2.1 0.6 ,1.0 to 

0 

0.2 0.5 1.5 0.7 Blank 
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El E2 C Total 
1.2 0.3 0.6 0.7 



2.7 2.1 1.5 2.1 



2.0 0.5 0.4 1.0 



93.3 96.4 96.1 95.2 



the 
0 

name 
0 

was • 
0 

store 



0.8 0.8 1.3 1.0 



Blank 



77.3 81.9 85.8 81.6 



12.4 11.9 8.4 i 10.9 



7.3 4.1 3.6 5.1 



1.8 1.3 1.3 1.5 



1.0 0.8 0.9 0.9 



went 
0 

John 
0 

the 
0 

store 
0 

Blank 



13.5 15.0 16.5 15.0 6. 



67.6 69.4 66.3 67.7 



17.1 13.2 14.2 15.0 



John 
0 

was 
9 

boy 
0 



1.2 1.3 1.5 1.3 



0.6 1.0 1.5 1.0 



to 



0 

Blank 



9 
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TEST-WISENESS TEST (CLASS EXERCISE) 
DIRECTIONS 



323 



DO 



SAY 



Pass out copies of booklets 
faced down to each student. 



Today, we are going to do an exercise 
together. It is a little like a game. 
Your job is to try your best to find 
the right answers to the questions on 
these sheets. It will be hard because you 
will not know the right answers to most of 
these questions. You're not supposed to 
know. all of the answers, but you must try 
your very best to figure the answers out. 
Class, will you know the answers to most 
of these .items? 

(Students : No. ) 
.Right. Is your job to try your best to 
figure out the answers anyway? 

(Students: Yes. ) 
Good! Every item' has only one correct 
answer. How many correct answers does 
each item have? 

(Students : One. ) 
That's right! Let's begin. 



Demonstrated 

Pause and check fingers. 



Turn your booklet over. 



Put your-finger on the number 1 at the 
bottom of the page. Good! 



Now, put your finger on the sample at the 
top of the page. 



SAMPLE: On a nice day, the sky is 



0 white 
0 blue 



0 pink 
0 black 



The sample will show you how to do the 
other items. What does the sample show 
you? 

(Students: How to do the other 
items. ) 

Yes. Read the sample to yourself. (Pause), 
Now, read the sample sentence with me. 
On a nice day, the sky is. Good. Now, , * 
let's read the four answer choices 
together. White, blue, pink, black. 
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4. 



Check to make sure students 
marked, answer correctly. 



Which word is the best answer to the 
sample question? 

(Students respond.) 
Yes, "blue." "On a nice day, the sky is 
blue." Everyone, mark the space in front 
of "blue" to show that it is the best 
answer. (Pause) For each item in Part A, 
try to find the one best answer and mark 
it the sanje way. . 



5. Turn to page 4 of test and 
point to -r 



STOP 



When I say begin, contin ue working until 
you see the word ( STOP 1 on page 4. You 
should^ork until you see what word? 

(Students: The word ' "stop. ") 
I will tell you to stop in 10 minutes, 
so vou will have to work very fast. 



6. Check fingers. 

Time for 10 minutes. 

When only 1 minute is left. 



Finger on item number 1. 
Ready, begin. ^ 

You have only one minute left to finish. 



7. 



Stop. Pencils down. 



8. Demonstrate. 



Turn to page 5. (Pause) Put your finger at 
the top of the 'page where it says Part B. 
Now, put your finger on the sample. Good! 
For this sample, you must find the word 
that best tells about the picture. 
Class, what is the picture? 

(Students: A table.) 
That's right, a table. Now, read the 
four answer choices with me. 

(Students: Top, leg, table, wood.) 



SAMPLE: 








Picture of 


0 top 


0 


leg 


table 


0 table 


0 


wood 
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Check to make' sure students 
marked answer correctly. 



Which word is the best answer ta the 
sample question? 

(Students respond. ) 
Yes, "blue." "On anice day, the sky is 
blue." Everyone, mark the space in front 
of "blue" to show that it is the best 
answer. (Pause) Fop each item in Part A, 
try to find the one best answer and mark 
it the same„ way. 



5. Turn to page 4 of test and 
point to -TsTQp-j 



6. Check fingers. 

Time for 10 minutes. x 
When only 1 minute is left. 



When I say begin, : contin ue working until 
you see the word | STOP | on page 4. You 
should work until you see what word? 

(Students: The word '"stop. ")■ . 
I will tell you to stop in 10 minutes, 
so vou will have to work very fast. 



Finger on item number 1. 
Ready, begin. 

You have only one minute left to finish. 



Stop. Pencils down. 



8. Demonstrate. 



Turn to page 5. (Pause) Put your finger at 
the top of the page where it says Part B. 
Now, put your finger on the sample. Good! 
For this sample, you must find the word 
that best tells about the picture. 
Class, what is the picture? 

(Students: A table. ) . . - 
That.^s right, a table. Now, readthe 
four answer choices with me. 

(Students: Top, leg, table, wood.) 



SAMPLE: 



/ 



Picture of 
table 



0 
0 



top 
table 



0 leg 
0 wood 



9 
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9. Check to make sure the 

students fill out the answer 
space correctly. 

/ 



Look at the picture again. Which word 
best tells about the picture? 

(Students: Table)) 
Yes, table is the best answer. Put a 
mark in the space in front of ■ the word 
"table" to show that it is the correct 
answer. 



10. 



Good. When I tell you to begin, you will 
do all o1 the items in Part B the same 
way.' 



11. Demonstrate. 



Time: 2 minutes. 



Put your finger on the stop sign at the 
bottom. Continue working until you come 
to this stop sign. Finger on item number 1. 

Ready, begin. . _ 

Stop. Pencils down. 



12. Check to make sure students 
are on correct page. 



Nice work! Now, for the next part of our 
exercise. Turn to page 6. (Pause) 
Look at the top ^of the page where it says 
Part C. Are you all at the right place? 

(Students: Yes.) 
Great! To do Part C, you will read a 
sentence and then find the answers to the 
questions I will ask you. You may look 
back at the sentence to help you find 
the right answer. Class, to find the 
answers to these items, can you look back 
at the sentence? 

(Students: Yes.) 
I will only be able to tell you the question 
once, so listen carefully. 



13. Check to see if students 1 
fingers are on correct 
. items. 



,Now, put your finger on the sentence 
: at the top of the page. Read it to your- 
selves. (Pause) Now, let's read it 
together .> ("The boy, whose name was John, 
went to the store.") Good. Now* put 
•your finger on item 1. Listen to the 
directions. Mark the word that comes after 
the word John in the sentence. (Pause) 



Finger on item number 2 . Mark the word 
that comes before the last word in the 
sentence. 
(Pause) 
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Finger on item number 3 . Mark the word 
that begins with a capital letter. 
(Pause) 

Finger on item number 4 . Mark the word 
that is the last word in the sentence. 
(Pause) 

Finger on item number 5 . Mark the 
word- that comes after the second comma. 
(Pairse) 

Finger on item number 6 , Mark the word 
that comes before the word with the 
capital letter. 
(Pause) 



14. Check for names. Good. Pencils down. Close your booklets. 

Now, class, please write your first and 
last names on the top of page 1. Good. 
Thank you for your good work today. 
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Appendix H 

Supplementary Data About Effect of Intervention 
on Dependent Variables 

Data File Code Book and Listing of Data \^ 

Sample Size, Means, Standard Deviations, and Medians for 
All Dependent— Variables Broken Down by Experimental Group 
for Various- Subsamples of Students (H1-H6) 

Frequencies of Major Variables 
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MASTER FILE. CODE 



CARD I 



CARD" 2 



1 n 1 i imn c 


Va r 1 aU lc 


LaDe 1 


i 


voi 


GROUP CI 


2-4 


V02 


ID C2 


5-14 


■ - 




15 


V03 


DIST 


16-17 


V04 


SCH . C3 


18-19 


V05 


TCHR C4 


20 






O 1 oo 
c \-d.d. 


V06 


TAT0P C5 


23-24 


V07 ' 


TAT FEE C6 


25-25" 


V08 


TATUSE C7 


9 "7 OO 


voy 


TATINC C8 


29-30 


V10 


TATFEEL C9 


*5 1 0 o 


Vll 


tatt n ^ 1 r\ 

T0TTD C10 


34^36 


V12 


T0TSD Cll 




Vlo 




/in V19 


V14 


bUTTDONP 




V15 




46-48 


V16 




'49-51 


V17 




CO r /i 


win 

V18 


S0TSDS0P 


55-59- 


■ V19 


ATSSD C12 


60-64 


V20 


ATSTD C13 


00— oy 


WO 1 


ATCMAT1J PI A 

A 1 MIA In L14 


70-74 


V22 


ATSREAD C15 


7G 7Q 

/ d— /y 


wo 0 


ATST0TAL CI 


80 






1-4 


_ 




5 


_ 


• • • 


6-7 


. V24 


QUALA C17 


8-9 


V25 


QUALB 


10-11 


- 




12 


V26 


FS 


13 


V27 


PT 


14-16 


V28 


REIN " 


17-18 


V29 


MEANC0PT . 


19 


V30 ' 


TITLEI 


20 - 


. V31 


SPED 


21 


, V32 


ES0L 


22-24 






25 


V33 


~ TSUPP 


26 


• V34 


TQUALIMP 


27-28 


V35 •.. 


EVALT0T 


29-30 


. V36 


EVALFS 



Code Name 

Experimental Group 
Student I.D. # 
Blank , 
District 
School 

Teacher (classroom) 
Blank 

Teacher Attitude Opinion of Tests 
teacher Attitude Teacher Feelings 
/Teacher Attitude Use of Tests 
Teacher Attitude Increased Use of Tests 
Teacher Attitude Students Feeliqgs i 
Teacher On-Task, Teacher Directed Test] 
Teacher On-Task, Student Directed Test 
Student On-Task, Teacher Directed On- 
Task 

" " , On-Task Probably 

.Stua. .On-Task, Stud. Directed, Tchr. 

Stop, On-Task ; ? 
Stud. On-Task, Stud. Directed, Tchr. 

Stop, On-Task Probably 
St-ud^-0B— Task-H Stud. Directed, Stud. , 

Stop, On-Task 
Stud. On-Task; Stud. Directed, Stud. 

Stop, On-Task Probably 
Achievement Test Scores, Stud. 



Blank 



Tchr. 
Math 
Reading 
Total 



Di re c ted 
Directed 



ID# 



Student 
Blank 

Quality" of Administration (Items 1-19, 

21-31) 

Quality of Administration (Items 8-19, 

21-31) 

Blank; 

§ Filmstrips Viewed 

# •Practice. Tests Completed 

# Reinforcement Points Earned 
Mean Correct on Practice Tests 
'Students in Title I 
Students in Special Education 
Students in English as Second Language 
Blank h ... ....... 

Teacher Support of^Project 
Teacher Quality of Implementation 
Total Score _af -Proje^i__Eyal uation 
Teacher Evaluation of Filmstrips 



•_) 



9 
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Columns Variable Label Code Name 

31-32 V37 EVALPT v Teacher Eval uation of Practice Tests 

33-34 V38 . . * EVALCOMM Teacher Evaluation of Contact With Staff 

. 35-36 V39 !■ FVALDATA Teacher Evaluation of Observation and 

^ ^ Data Collect 

37-38 V40 ; > EVAL'GEN' TeacherEvaluation o-f .Project in General 

39-40 V41 . EVAL RE IN Teacher Evaluation of Rei nforcement 

(Exp.- I Only) 

41-42 V42 EVAL'SPWK Teacherj-Eval uat,ion of Spring. Workshop 

(Exp. I Only) 1 

43-44 V43 STWA C18 Student Test Wiseness A 

45-46 V44 STWB Student Test Wiseness B 

47-48 V45 STWC Student Test Wiseness C 

49-50-. V46 • SATT C19 Student Attitude Score 



f 




7V r ^rst 29 Cases t 2 cards per case) of Title r Data'nie : V. 

!!! 0001 ' 6 30101 ,1 0192^0722 1 " ■ • 

as^ra««^ ,,,,,:,, ' m,,,: 

JJJJ *•« ■* .-.» -V, , 

li, • . i7K?i«a, Mi,,'""""""" 13 * BEST COPY AVAJl IRif 



• \; Table H- 1 ,' • ' 

Means, Medians,. and Standard Deviations for Dependent 
Variables by Experimental Group (All students) 



Variable 


015 


UP 1 . GRC 


ATM ' 


CONTROL (MA 


r 


P 


,1 

* * •* 

O 


N 




so' 


Median 


N 


r 


so 


Median 


N 


J 


SD 


Median 


iTeacher Attitude (Total) 


552 


89.1 


13.2 


89.8 


460 


87.9 


11.1 


87. Z 


474 


85.7 


12.3 


65.5 


9.B5 


.000 


: Teacher Attitude (Opinion) 


552 


' 14.7 


4.3 


14.8 


460 




3.6 


' 15.1 


474 


14.9 


3.1 


14.9 




.292 


Teacher Attitude (Feelinq) 


552 


20.7 


2.6 


20.2 


460 


20.9 


2.8 


20.7 


474 


19.5 


3.1 


19.6 


3/ .32 


.000 


' ' Teacher Attitude (Use) 


552 


24.8 


• 4.7 


"23.3 


460 


25.6 


3.7 


26.8 


474 


25.9 


4.2 


25.8 


9.0/ 


.000 


- -Teacher Attitude (Increase) 


• 552 


10.5 


2.6 


10.9 


■460 


10.0 


1.9- 


10.1 


474 


- 10.1 


2.5 


1.0.1 


/.SI 


- .000- 


: Teacher Attitude (S. Feel) 


552 


18.3* 


: 4.9 


18.2 


460 


' 16.2 


3.1 


16.6 


474 


15.3 


3.3 


15.5 


C7 oil 


AAA 

.000 


J>:/ x 


■feachcr Dn-Task fTD*) 


501 


77.1 


35.7 


98.3 


•460 


73.1 


34.7 


89.1- 


432 


59.2 


'49.0 


68.9 




♦UO0 


Teacher On-Task (SO**) 


:501 


■80.6 


22.8 


92.8 


460 


78.4 


24.3 


88.7 


"432 


83.2 


37.7 


87.7 


*) At 


.048 , 


• Student On-Task (*TO> <• 


91 


.88.4 


' 14.3 


.90.8. 


85- 


.89.7 


11.8 


• 94.4 


- 90 


89.2 


11.1 


92.5 




ice 

.785 


3 


Student On-Task (SO) 


95 


90.6 


11.9 


93.4 


79 


89.9 


U.9 


93.8 


90 


90.5 


9.3 


92.5 


An 

.09 


'.91 1 


Achievement Test (SO) 


512 


-.1 


1.0 


.2 


■ 418 


!o 


1.0 


.4. 


454 


.1 


'.9 


.3 




All 

.033 


'■-i 


Achievement Test (TO)' . 


512 


r.l 


.1.0 


.1 


417 


.1 


1.0 


.3 


454 


.0 


.9 


.1 


3.79 


.023 


}chievement-Test-(Math)-~ 


-513 


-•vl- 


-1V0 


..1- 


—418- 


- ..1 


-1.0- 


— .1- 


452- 


•—.2 




•.2- 


11 qji 
1 KS<h 


—000- 


;«tnie»enieni lesi 
T (Total Read) 


511 


-.i 


1.0' 


.1 


417 


.0 


1.0 


.3 


454 


.1 


.9 


„ .2 


4.36 


.013 




Achievement Test (Total) 


509 


-.1 


1.0 


..0 


417 


:.0 


1.0 


A 


452 


■ .1 


.9 


.3 


1 rn 

,.7.68, 


. .000 


4 * 


- -^tra-rttrot-Ttst— ^ — 7= 
--Administration — ll_ 


-501., 


-49.7. 


-8.5; 


5?.l 


,J60. 


J0.6 


3,6. 


.50.9 


432 


48.8 


3.8 


48.3 


11. H 


ft/\A 

.000 




JIM 


'est-Takinq Skills (Wise) 
fest-takinJ Skills. V... 


. 489 


•9.9 


4;0 


9.4 


'387' 


10.0 


4.0 


9.7 


462 


10.2 




10.0 


■:.8B 


.412 


_ 


Deductive) 


489' 


.6.1 


1.2 


6.3 


387 


6.1 


1.2 


6.3 


462 


6.1 


1.3 


6.3 


.45 


.640 - 


W4king Skills 
Directions) • 


489. 


5.1 


1.1 


.5.5 


387 


5.2 


1.1 


5.6 


462* 


5.2' 


i.r 


5.5 


1.12 


.326 


Student Attitude 


490 


11.9 


3i 


11.3 


388 


11.7 


3.4 


11.2 


468 


12.4 


• 3.5 


12.1 


4.57 


.011 





^TMcher Directed " ; : — — - — lL_ « 

^Student 01 reeled — L _ ; • ~ 




> Table. Hi 

.variables' by Experimental Group 
■ .. (Students Receiving Majority of Treatment*) ' 




Table 13 



Means, Medians, and.Standard Deviations for Dependent 
• . Variables by .Experimental' Group , 
(Students Receiving All of Treatment 11 ) 



Variable 



< '"I 

Teacher Attitude^ ( Ooininnl 



Teacher Attitud e. jFepHn^) 
Teacher Attitude=( Us^ 



JMer^UijdeJS. F ee | 



Teacher On-Task (TD[ 



Teacher On-Task (Sj[ 



Student On-Task (TD|~ 



Student On-Task (SOV 



Achievement Test ( SD ) 
: Achievement Test (TfiT 

Mj wgeBt Test (Hath) 
Achievement Test 
(Total Read) ' 



. Achievement Test ( Total) 



jfetak inq Skills ( His* 
^akCTTir""- 



"IS 



Deductive. 

'eU-TlHnpnTfs 
'Directions) 




mm \mt. : : ' mm *** " » w I. swelil education proguns, or lud EngHsh « 



Table -K. 4 



: Means,: Medians, and Standard Rations for Dependent 



Variable 



(Only Title T Students Receivinn All of .Treatment 5 ) 



JgacheP Attitude (gpjjnfa^ 
.Teachgr Attitude (FeM jnn) 



. <> 

Teacher Attitude JUs ej 



.Teacher Att<tude_(Increa5e 



.Teacher Attitude (S.jfli>i ) 



Teach er Un-Task ( TP) 
.Teacher On»Task ( SO) 



jtbdenf On-Task (TO) ' 



Student 6n-Tas,k ( Sii 



Achievement Test ( SOI 



Achievement Test ( TO) 



Jchjenement.Test ( Hath) 
Achievement. Test 



JTotal Read) 



fAchteitanerit Test_( Tn^i) 
Quality of Test 
Administration 



^Tak-inn-Sk4l-ls-(-Vhe) 
fiSt-Takini) Skills - 
DeductFe) 



est-taking SkfTiT 
[Directions) • _ 



Student AUittfde : 




?. a *^WM._._^j^-j^ ™ l * " ere special educatfon programs, or had English 



^:L.s \ / Table H, 6 .", , . • X:'-.."" ' 

Means, Medians, and, Stan da rd; Devi a(ions ^.Dependent..' 
'Variables by Experiment 1 Group ;(0nly Cache and- Nebo. ■ 
Districts wttb Studetits Receiving Majority' of Treatment 3 ') 



.'. Variable ' 




• ©GuP 1 


" ""Gftfl 




CONfROnMUP 


V 


7 




Median 




X 


j 

SD 


Median 


-lt_ 


XT 


SO 


Median 


•Teacher Attitude (Opinion) 


133 


11 c 


C 9 


1 A A 

.14,4 


1 01 

121 


1C 0 

• 15 .2 


2.2 


If A 

16.4 


254: 


15. 1 


3.9 


14.7 


■Teacher Attitude (Feelincj) 


133 


10 0* 

IV. o 




In n 
19,0 


1 01 
121 


* 

19,; 5 


1.8 


In ft 

19.0 


Of 4 

254 


19.2 


3.2 


18.7 


.-Teacher Attitude (Use) 


133 


ct.o 


A t 


23.8 


101 

121 


09 1 

27.1 


0 n 
2.8 


On 9 

28.2 


254 


25,9 


• 4.3 


.26.2 


] teacher Attitude- (Increase] 


133 


0 S 


9 9 
fc.L 


Oft 

9.0 


191 

121 


1ft Q 

1U.8 


1 9 
1.2 


■"1 1 o 
11.2 


254 


5 '-. ■ 
10. 3i 


1.7 


10.1 


Teacher Attitude (S. Feel) 


133 


10. / 


3./ 


18.7 


101 

121 


<J 1 c 0 

v 15.8 




1£ ft 

16.0 


Of A 

254 


15,2 


3.^ 


15.1 


■ Teacher On-Task (TO) 


133 




lb. 4 


nn •> 

98.3 


1 01 

121 


nS e 

93.6 


5.5 


- — " ,h 
.96.0 


01 ^ 

216 


. 71.9 


54.9 


77.6 

TSti 11- " 

93.5 


•Teacher On-Task (SO) 


133 


1C It 

n.r 


9C C 

Zb.b 


On ft 

79.0 


.101 
121 


nft n 

90.9 


1 0 n 

. 17>0 


98.0 


216 


83.2 


51.8' 


_ Student On-Task (TO) 


...28. 


on 1 


9.6 


87,5 


22 


90.3 


1 A 1 

12.1. 


95.5 


44 


89.7 


10.9 


.,92.5. 


; Student On-lask (SO) '.. 


, -28 


96;6 


4.0 


97.9 


11 


85.4 


• 19.5 


98.5 


44 


9?,1 


11.0 


. ?7>0 


.^hlevement Test (SO) 


128 


.2 


.9 


.1 


116 


-.2 


i.o 


.0 


248 


.3 


.8 


.3 ' 


^Achievement Test (TO) ■ 


128 


.2 


.9 


.1 


116 


•.1 


• 1.0 


A 


24S 


» 

.2 


.9 


.1 


1 Achievement Test (Math) 


128 


.1 


.9, 


•.1 


in 


■ ..2 


' ' .9 




-4 
243 


.3 


. .9 


,'...4. 

A' 
A 


u A" hi Au art ant- Tai»* 

: (Total Read) ; 


128 


.2 


Q 


1 


* , 

iifi 
no 




-10 




tti) 


9 
* J 


^' 0 

.0 


• Achievement Test (Total) 


127 


~ .2 






jrllt 


•.1 


-i;o 


..,0 


248 


.3 


.8 


.3 


."payor" test ' ' 

\; AdlinlslTalTcjn" 1 


-133- 




—14i 


_S?4 


'121 


51.7 


2.8 


51.0 


216 


48.2 


3.4 


48.4 


ijesMakinq Skills (Wise) 


121 


10.8' 


4.0 


10.7 


105 


10.0 


4.2 


•9.7 




11.0 


~4T 


TiJ" 


'TesTTak^iini-rr 1 - — : 
• ilOetfuctive) '• 


121 


. '6.4 


1.0 




Tor 


"XT 


-lTO- 


•^-673- 


-24f 




- — — +9l 


-.6,6 


"Test-Taking Skills. •. " 
..(Directions) 


U2L 


-5.2- 


-4* 




105- 


=5* 




5t7- 


-24r 


-574- 


.9" 




•f Student -Attitude ••- .'••. 


•121 


n.6' 


M.5 


11.4 


105 


11.5 


3.0 


"■u7z 


■252 


12.6 


3.8 


12i5 



^^iij^ ha/ teachers who : ? 

Wf6;r«ed: low on quality of implement at ton oMijpportTirw^e^^etrW^dut^ 
4 second I anquage. . > :- • ■ ; , 




•... .......i.:.-.' t. ••■• / :-• ,• •-;.•..• . 
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— - : — . v ' Table H. 7* • ' * ■ 

• — » . / . 

. » " FREQUENCIES oVaCL VARIABLES 

* r ■.'»'<•. 

• — — : . - <k ■ • . 


Teacher Attitude. SO • 
(Opinion) ■ Mdn . 14.98 

Min • 6.00 
^lax * 25.00 


- , • 7 • o no J ^ • - 

Achievement Test ... . SO . .99 
(Matn ) Mdn - .02 

Min * -3^05 '* 
Max.« 2.64 1 


Teacher Attitude <JD T I°m 6 
' . Idling) m*20AS 

Min * :.V 00 

v, - Max ■25,00 


H : i— . . _S 

Achievement Test * ° Q0 

• (Total Reading) ; Mdn • .21 "'\ 
1 « ' Min ■ -3.48 . 

j ~ - ' Max « 1.64 


Teacher Attitude SD T f?* 3 '" 
(Use ' Mdn » 25.48 

M1n ■ 16.00 

- rMax « 34.00 


1 T « 0 " 
Achleyenient *Test SD « 99 - - 
t Tota1 ' - Mdn,-!l4 .... 

1 ■ ... M{ n * .1 Ql) 

J ; • ■ ^ Max » 2.02 * 


• • • IT • 10 ?3 
Teacher Attitude SD - 2.36 - 

(Increased; Use of Tests)' .,.M4p » 10.26 

•" : ftYn • 4.60 -' 

Max • 15.00 ■ 


Quality of. Test SD T s'iJ 9 
k Administration . Mdn - 50.80 

: . • . •-. Min jL.18.00 

^ Max • 58.00 


Teacher Attitude S 0 li**? 

(Students' Feeling) • 16.57 

Min * 6.00 
• . Hut * nn 


Student Test-Taking So I 4°n2 2 

1 Skills (wise) nfi-riSrSL 

\v . Min • 0 . • 

. Max » 23 \- 


Teacher On-Tasr - - SD--SiS 

(Teacher-Oirectetf Test) Mdn » 89:41 ~~ 

Min » 0.00 
Max • 78.00 | 


StudentaesUTaking Skills SO * 1*23 " > " 

(Deductive) ^- X ~~~ M ^6t28 

Min • 0 , 

■ Max » 8 . ' 


TTmr^ 1- c tlO ^-r-Ofl • s fc- — : — — . — - — •_ ~-_-SuL*_28 * 72 


^ Student Test-Taklng Skills J ! ~ 


^:Jt3tHftnMl£etted..Teit) • - lidn^89M^^ 

M1n •14.00 

• Ma* a 7/1 flfl 1 
a * j • ^ 1 


Directions—^ Md!>:4JL 

Max » 7 /r 


~ Student On.Task - ~ ~" ~~ ""loVi^Sf-- 
(Teacher-Olrected Test) Mdn V93.*00 

""r ; "V;*";.v"4-v';v:: : '-^.'....' • - Min • 0.00 j 
: < . . '""""^' ^ Max * 100.00- . 


" Studeh t ' Att 1 tude ^ " * - Z \ 2 ^ 3 "" ' — 

, •' • • . Mdn -11.59 ' ' V \ 

• ' 1L- : . ki. . Min * 0 

- • - ,:. Max = 24 ~ " T 


• "•: Student tin-Task ^' .-'v- .-^^ I u*03 1 
v (Student-Olrected Tett). ' .° Mdn • 93-55 
~ - Min • 14 !o 

: ' '• ••. ;.o Max • 100.0 


Student in Title 11 fio o • 1073 72% ~^ 

\ . fes : 1 • 408 . > 28* ••.;.>*•.'••• 


; - Achievement Test Score . ' ~ w SO • ?gS^ ' ^ " 1 
;:: : . (StuOMt-Olrcctcd) "ton • III 

• - , Min • 13.65 > 
: - Max.« 1.50 ;\ 


Student in Special No '• VViA4 : "' ; : " "qi « - •. *- T -^-i- 
V Education? . ; : >|_^ - 9% , 7 ' . 


^Achlevcme^t f en ■ Score 7 '~r^$[ l^^.^L 
^(Teacher-Olrected) Mdn * .14 

Min • I4.2S 

_ * >v , Mary 1.99 " , 


- Student With £n9l 1lh; i7 N 0 v r 0 1439 ^ 97* 

. M a Second Language? .Tes . 1 - 42 ;: : ^ : 3%" ~" : ^ 7"vf : 

— — — — • — -1 — — '-• ■ ■ ■ ' • 


." ' ' .-.-<■■ . . j - ^ 1 • . ■ ' » • • 4 • .• , - _ — : • ^ ■ -»-.-■•> — . '-' 

...... -:^;» > ^.;.-. . '•" -A •• ^i'/. •/•„' r r ~~ r -;'; »' ■•" > '::'.^v =."•/->-•" _ ;: ■ • ' V.. ;: : ':' ' - ': ' :::.; : .y, 


: - - :_BEST COPY MAILABLE 


- - 0 0 0 
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Table H,r(»); 



Teacher Support 1 » 51 5% 
of Program 2 • 377 37% 

3 • 584 58% 



of Teac^ P i or * 16 ° . i« 
, r Imple - J . 1 2 ■ 270 27%. 
mentation Good ,3 - 582 58% 



X * 2.53 
SO = .59 
Mdn - 2.63 
Min «' 1 - 
Max » 3 



X • 2.42 
SO 3 .75 15 
Mdn * 2.63 
Min O 
Max » 3 



Evaluation 
of Project 



• X • 1.97 
. SD .55 
Mdn « 1.97 
Min k 1 
Max » 3.4 



Evaluation of 
Fllmstrips 



X ■ 1.65 
SO * .49 

Mdn * It 62 

Min « 1 

Max * 3 



Evaluation of 
Practice Tests 



X » 2.04 
SO - .75 
Mdn » 1.98 
Min ^ -1 
_M.a x « 4 .4 



-Evaluatio.n_Q.L_ . 
Communication 



SO ■ .62 
-MdTr*-l-72- 
Min .80- 
Max » 3.0 



Evaluation of 
Data Collection 



X - 2.07 
SD • .89 

Mdn «■ 2.06 

Min - 1 

Max- *~4*-7- - 



Evaluation of 
General impressions 



7 
SO 
Mdn 
Min 
Max 



2.01 
.69 
1.99- 
1 * 
4.2 



Evaluation of 
Reinforcement 



T 
SO 
» Mdn 1 
Min ; 
Max : 



2.83 

1.04 

2.97 

1.0 

4.4 



^Evaluation of 
»Spring Workshop- 



X « 2.18 
SO » 1.30 

Mdn - t.8 J 8, 

Min» 1 

Max » 5 ■ 



No. of Fllmstrips 
Viewed 



J -.8.19 
SO « 1.46 
Mdn * 8..67 
M1n « 0 
Max « 9 



No. of Practice 
Tests Taken 



X 

so • 

Mdn J 
Min, 1 
Max 



6.34 

1.14 

6.68 

0 

7 



M X. No. of Reinforcement 
-' — Points Per Test 



— *-*-3r7r- 
S0 • 1.44 
Mdn «^3.67 
.Min » ..40 ' 
Max » 9.5 



Mean %' Correct on 
Practice T*sts 



- X • 82.75%. 
SO «U4.72%. 
Mdn « 07.18% 
Min -7% 
Max » 99% 



: "J I- 



■O" '• 
'•1! " ' 
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